Page 1 of 1

3.9.2 kernel lockup

PostPosted: Thu May 16, 2013 7:49 pm
by shadowdaemon
3.9.2 kernel has locked up on me when booting a usermode Linux system. The usermode kernel is version 3.3.3 and used to work on a version 3.3.3 x86 host without too many problems, if I remember correctly. Previously, this same usermode kernel caused a crash of the host 3.8.10 kernel (invalid opcode, see my other thread). Sometimes the usermode system boots fine, so the problem is intermittent.

Code: Select all
[ 7148.397567] general protection fault: 0000 [#1] SMP
[ 7148.397621] Modules linked in: netconsole snd_hda_intel snd_hda_codec snd_pcm snd_page_alloc snd_timer snd broadcom tg3 ptp pps_core [last unloaded: netconsole]
[ 7148.397739] CPU 0
[ 7148.397755] Pid: 8727, comm: linux-3.3.3-har Not tainted 3.9.2-grsec #1 Acer             Aspire 5741     /Aspire 5741     
[ 7148.397808] RIP: 0010:[<ffffffff8106752a>]  [<ffffffff8106752a>] ptrace_do_notify+0xaa/0xb0
[ 7148.397858] RSP: 0000:ffff88006baa1ec0  EFLAGS: 00010092
[ 7148.397884] RAX: 0000000000000001 RBX: ffff88006c111530 RCX: 0000000000000000
[ 7148.397917] RDX: ffffffffffffffff RSI: ffff88006c111530 RDI: ffff88006c111530
[ 7148.397950] RBP: 000000006baa1ed8 R08: ffff88006c111970 R09: 0000000000000000
[ 7148.397983] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000085
[ 7148.398016] R13: 00000000ffffffda R14: 0000000000000000 R15: 0000000000000000
[ 7148.398049] FS:  0000000000000000(0000) GS:ffff88006f400000(0063) knlGS:000000004023ab80
[ 7148.398085] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[ 7148.398113] CR2: 0000000055370d58 CR3: 0000000001c06000 CR4: 00000000000007f0
[ 7148.398147] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7148.398179] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 7148.398213] Process linux-3.3.3-har (pid: 8727, threadinfo ffff88006c111970, task ffff88006c111530)
[ 7148.398253] Stack:
[ 7148.398265]  80000000810687a8 000000006baa1f18 00000000ffffffff 000000006baa1f08
[ 7148.398312]  ffffffff8101080e ffffffffffff4111[ 7148.404779] ---[ end trace 3564728f3403816c ]---
[ 7148.404784] grsec: banning user with uid 1000 until system restart for suspicious kernel crash
[ 7148.404926] grsec: (lonewolf:U:/) special role lonewolf-admin (id 4) exited by /bin/bash[bash:2493] uid/euid:1000/1000 gid/egid:100/100, parent /bin/login[login:1622] uid/euid:0/0 gid/egid:100/100
[ 7170.773022] ------------[ cut here ]------------
[ 7170.773047] WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0xaa/0xd0()
[ 7170.773050] Hardware name: Aspire 5741     
[ 7170.773054] Watchdog detected hard LOCKUP on cpu 1
[ 7170.773057] Modules linked in: netconsole snd_hda_intel snd_hda_codec snd_pcm snd_page_alloc snd_timer snd broadcom tg3 ptp pps_core [last unloaded: netconsole]
[ 7170.773092] Pid: 2493, comm: bash Tainted: G      D      3.9.2-grsec #1
[ 7170.773095] Call Trace:
[ 7170.773099]  <NMI>  [<ffffffff810536da>] warn_slowpath_common+0x7a/0xc0
[ 7170.773113]  [<ffffffff8105383c>] warn_slowpath_fmt+0x5c/0x70
[ 7170.773119]  [<ffffffff810c910a>] watchdog_overflow_callback+0xaa/0xd0
[ 7170.773126]  [<ffffffff810da88c>] __perf_event_overflow+0xac/0x250
[ 7170.773132]  [<ffffffff810d77d4>] ? perf_event_update_userpage+0x24/0x130
[ 7170.773138]  [<ffffffff810db370>] perf_event_overflow+0x30/0x50
[ 7170.773146]  [<ffffffff81019c52>] intel_pmu_handle_irq+0x242/0x3c0
[ 7170.773152]  [<ffffffff81012ece>] perf_event_nmi_handler+0x2e/0x40
[ 7170.773158]  [<ffffffff81006a7e>] nmi_handle.isra.0+0x5e/0x90
[ 7170.773163]  [<ffffffff81006c49>] do_nmi+0x199/0x350
[ 7170.773171]  [<ffffffff817c7266>] end_repeat_nmi+0x34/0x44
[ 7170.773179]  [<ffffffff81322f98>] ? __write_lock_failed+0x18/0x40
[ 7170.773184]  [<ffffffff81322f98>] ? __write_lock_failed+0x18/0x40
[ 7170.773190]  [<ffffffff81322f98>] ? __write_lock_failed+0x18/0x40
[ 7170.773193]  <<EOE>>
[ 7181.099357] ------------[ cut here ]------------
[ 7181.099370] WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0xaa/0xd0()
[ 7181.099374] Hardware name: Aspire 5741     
[ 7181.099377] Watchdog detected hard LOCKUP on cpu 3
[ 7181.099380] Modules linked in: netconsole snd_hda_intel snd_hda_codec snd_pcm snd_page_alloc snd_timer snd broadcom tg3 ptp pps_core [last unloaded: netconsole]
[ 7181.099411] Pid: 2961, comm: xterm Tainted: G      D W    3.9.2-grsec #1
[ 7181.099414] Call Trace:
[ 7181.099417]  <NMI>  [<ffffffff810536da>] warn_slowpath_common+0x7a/0xc0
[ 7181.099430]  [<ffffffff8105383c>] warn_slowpath_fmt+0x5c/0x70
[ 7181.099435]  [<ffffffff810c910a>] watchdog_overflow_callback+0xaa/0xd0
[ 7181.099441]  [<ffffffff810da88c>] __perf_event_overflow+0xac/0x250
[ 7181.099447]  [<ffffffff810d77d4>] ? perf_event_update_userpage+0x24/0x130
[ 7181.099452]  [<ffffffff810db370>] perf_event_overflow+0x30/0x50
[ 7181.099459]  [<ffffffff81019c52>] intel_pmu_handle_irq+0x242/0x3c0
[ 7181.099464]  [<ffffffff81012ece>] perf_event_nmi_handler+0x2e/0x40
[ 7181.099470]  [<ffffffff81006a7e>] nmi_handle.isra.0+0x5e/0x90
[ 7181.099475]  [<ffffffff81006c49>] do_nmi+0x199/0x350
[ 7181.099482]  [<ffffffff817c7266>] end_repeat_nmi+0x34/0x44
[ 7181.099488]  [<ffffffff81322f98>] ? __write_lock_failed+0x18/0x40
[ 7181.099493]  [<ffffffff81322f98>] ? __write_lock_failed+0x18/0x40
[ 7181.099499]  [<ffffffff81322f98>] ? __write_lock_failed+0x18/0x40
[ 7181.099502]  <<EOE>>  [<ffffffff817c6529>] _raw_write_lock_irq+0x29/0x40
[ 7181.099512]  [<ffffffff81059c89>] do_exit+0x349/0xa30
[ 7181.099519]  [<ffffffff8111d9da>] ? kmem_cache_free+0xfa/0x110
[ 7193.055241] ------------[ cut here ]------------
[ 7193.055249] WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0xaa/0xd0()
[ 7193.055253] Hardware name: Aspire 5741     
[ 7193.055255] Watchdog detected hard LOCKUP on cpu 2
[ 7193.055258] Modules linked in: netconsole snd_hda_intel snd_hda_codec snd_pcm snd_page_alloc snd_timer snd broadcom tg3 ptp pps_core [last unloaded: netconsole]
[ 7193.055288] Pid: 8513, comm: linux-3.3.3-har Tainted: G      D W    3.9.2-grsec #1
[ 7193.055291] Call Trace:
[ 7193.055294]  <NMI>  [<ffffffff810536da>] warn_slowpath_common+0x7a/0xc0
[ 7193.055305]  [<ffffffff8105383c>] warn_slowpath_fmt+0x5c/0x70
[ 7193.055310]  [<ffffffff810c910a>] watchdog_overflow_callback+0xaa/0xd0
[ 7193.055316]  [<ffffffff810da88c>] __perf_event_overflow+0xac/0x250
[ 7193.055321]  [<ffffffff810d77d4>] ? perf_event_update_userpage+0x24/0x130
[ 7193.055327]  [<ffffffff810db370>] perf_event_overflow+0x30/0x50
[ 7193.055332]  [<ffffffff81019c52>] intel_pmu_handle_irq+0x242/0x3c0
[ 7193.055338]  [<ffffffff81012ece>] perf_event_nmi_handler+0x2e/0x40
[ 7193.055343]  [<ffffffff81006a7e>] nmi_handle.isra.0+0x5e/0x90
[ 7193.055348]  [<ffffffff81006c49>] do_nmi+0x199/0x350
[ 7193.055353]  [<ffffffff817c7266>] end_repeat_nmi+0x34/0x44
[ 7193.055359]  [<ffffffff817c6802>] ? _raw_spin_lock_irq+0x22/0x40
[ 7193.055364]  [<ffffffff817c6802>] ? _raw_spin_lock_irq+0x22/0x40
[ 7193.055370]  [<ffffffff817c6802>] ? _raw_spin_lock_irq+0x22/0x40
[ 7193.055373]  <<EOE>>  [<ffffffff810615ad>] ptrace_check_attach+0xed/0x170
[ 7193.055509]  [<ffffffff81063900>] compat_sys_ptrace+0x110/0x170
[ 7193.055515]  [<ffffffff817c8c24>] sysenter_dispatch+0x7/0x24
[ 7193.055520] ---[ end trace 3564728f3403816f ]---
[ 7204.348302] ------------[ cut here ]------------
[ 7204.348310] WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0xaa/0xd0()
[ 7204.348313] Hardware name: Aspire 5741     
[ 7204.348315] Watchdog detected hard LOCKUP on cpu 0
[ 7204.348318] Modules linked in: netconsole snd_hda_intel snd_hda_codec snd_pcm snd_page_alloc snd_timer snd broadcom tg3 ptp pps_core [last unloaded: netconsole]
[ 7204.348350] Pid: 8727, comm: linux-3.3.3-har Tainted: G      D W    3.9.2-grsec #1
[ 7204.348353] Call Trace:
[ 7204.348356]  <NMI>  [<ffffffff810536da>] warn_slowpath_common+0x7a/0xc0
[ 7204.348366]  [<ffffffff8105383c>] warn_slowpath_fmt+0x5c/0x70
[ 7204.348372]  [<ffffffff810c910a>] watchdog_overflow_callback+0xaa/0xd0
[ 7204.348377]  [<ffffffff810da88c>] __perf_event_overflow+0xac/0x250
[ 7204.348382]  [<ffffffff810d77d4>] ? perf_event_update_userpage+0x24/0x130
[ 7204.348388]  [<ffffffff810db370>] perf_event_overflow+0x30/0x50
[ 7204.348393]  [<ffffffff81019c52>] intel_pmu_handle_irq+0x242/0x3c0
[ 7204.348399]  [<ffffffff81012ece>] perf_event_nmi_handler+0x2e/0x40
[ 7204.348404]  [<ffffffff81006a7e>] nmi_handle.isra.0+0x5e/0x90
[ 7204.348409]  [<ffffffff81006c49>] do_nmi+0x199/0x350
[ 7204.348414]  [<ffffffff817c7266>] end_repeat_nmi+0x34/0x44
[ 7204.348422]  [<ffffffff817c6460>] ? _raw_spin_lock_irqsave+0x20/0x40
[ 7204.348427]  [<ffffffff817c6460>] ? _raw_spin_lock_irqsave+0x20/0x40
[ 7204.348432]  [<ffffffff817c6460>] ? _raw_spin_lock_irqsave+0x20/0x40
[ 7204.348435]  <<EOE>>
[ 7204.348438] ---[ end trace 3564728f34038170 ]---

Re: 3.9.2 kernel lockup

PostPosted: Sat Jun 08, 2013 12:27 am
by shadowdaemon
I ran vanilla 3.9.2 with the usermode system as guest for about a week as a test, didn't see any kernel panics. I didn't test the actual guest system very much however. Now I've updated the (host) kernel to 3.9.4-r1 using the hardened Gentoo patchset, I had a kernel panic within about one hour. I've used netconsole to gather the logs, but there appears to be a bit missing, not sure why.

Code: Select all
[66904.538385] ------------[ cut here ]------------
[66904.538433] kernel BUG at /usr/src/linux-3.9.4-hardened-r1/arch/x86/include/asm/pgtable.h:100!
[66904.538474] invalid opcode: 0000 [#1] SMP
[66904.538505] Modules linked in: netconsole usb_storage snd_hda_intel snd_hda_codec snd_pcm snd_page_alloc snd_timer snd broadcom tg3 ptp pps_core
[66904.538615] CPU 0
[66904.538631] Pid: 0, comm: swapper/0 Not tainted 3.9.4-hardened-r1 #2 Acer             Aspire 5741     /Aspire 5741     
[66904.538681] RIP: 0010:[<ffffffff810838c4>]  [<ffffffff810838c4>] native_pax_open_kernel+0x24/0x30
[66904.538732] RSP: 0018:ffffffff81c01e78  EFLAGS: 00010006
[66904.538758] RAX: 000000008004003b RBX: ffff88006f40e500 RCX: 0000000000000001
[66904.538790] RDX: 000000008005003b RSI: ffff88006c247100 RDI: ffffffff81c19440
[66904.538823] RBP: ffffffff81c01e78 R08: 0000000000000001 R09: 0000000000000001
[66904.538855] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880013f1df80
[66904.538887] R13: ffff880013f1df80 R14: 0000000000000000 R15: 0000000000000000
[66904.538920] FS:  0000000000000000(0000) GS:ffff88006f400000(0000) knlGS:0000000000000000
[66904.538957] CS:  0038 DS: 0000 ES: 0000 CR0: 000000008005003b
[66904.538984] CR2: 00000000553a5470 CR3: 0000000001c06000 CR4: 00000000000007f0
[66904.539017] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[66904.539050] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[66904.539082] Process swapper/0 (pid: 0, threadinfo ffffffff81c19880, task ffffffff81c19440)
[66904.539118] Stack:
[66904.539131]  ffffffff81c01ed8 ffffffff817c738d ffffffff8108b860 ffffffff81c19880
[66904.539177]  ffff88006c247100[66904.539360]  [<ffffffff817c7554>] schedule+0x24/0x70
[66904.539519]  [<ffffffff81e1090f>] 0xffffffff81e1090e
[66904.545710] ---[ end trace 20654e3c41004d1d ]---
[66904.545714] Kernel panic - not syncing: grsec: halting the system due to suspicious kernel crash caused by root
[66904.545810] drm_kms_helper: panic occurred, switching back to text console

Re: 3.9.2 kernel lockup

PostPosted: Sun Jun 09, 2013 12:12 am
by shadowdaemon
Disabled UDEREF on 3.9.4 and the problem seems to have gone away, I hope so anyway. :D

Re: 3.9.2 kernel lockup

PostPosted: Sun Jul 14, 2013 6:57 pm
by PaX Team
since then i changed the cr0.wp handling code a bit, can you retest with a recent (3.10) kernel?