Call trace / "rcu_sched self-detected stall on CPU" in dmesg
Posted: Mon Feb 22, 2016 6:39 am
Hi,
I have a problem here that on a server with a relatively recent grsecurity kernel we got call traces in the dmesg and subsequently the system ran extremely unstable or just crashed.
The dmesg messages looked like the one below (but we got dozends of them).
This was a 4.3.5 kernel with patch
grsecurity-3.1-4.3.5-201602092235.patch
(I haven't tested the latest 4.4 patch yet)
The issue didn't appear reliably, it happened after a while (and on another almost identical system it didn't happen, or maybe just didn't happen yet).
I can only give limited information as it's difficult to debug with the issue not being easily reproducible. But I haven't found any report with a similar stack trace in relation to grsecurity (although I don't 100% know if it's a grsecurity issue, maybe a kernel issue).
[352155.140013] ffff88002b55c000 ffffffff815c14e0 ffff88002b55c000 ffff880c3fd43e90
[352155.140013] Call Trace:
[352155.140013] <IRQ> [<ffffffff810997ea>] ? rcu_dump_cpu_stacks+0x62/0x81
[352155.140013] [<ffffffff8109baf4>] ? rcu_check_callbacks+0x213/0x5c6
[352155.140013] [<ffffffff8109dc0a>] ? update_process_times+0x23/0x45
[352155.140013] [<ffffffff810a94d2>] ? tick_sched_timer+0x2f/0x53
[352155.140013] [<ffffffff8109e082>] ? __hrtimer_run_queues+0x81/0x116
[352155.140013] [<ffffffff810a3ff6>] ? ktime_get_update_offsets_now+0x43/0xd4
[352155.140013] [<ffffffff8109e734>] ? hrtimer_interrupt+0x8e/0x160
[352155.140013] [<ffffffff810273f8>] ? smp_apic_timer_interrupt+0x34/0x43
[352155.140013] [<ffffffff813ee5ee>] ? apic_timer_interrupt+0x8e/0xa0
[352155.140013] <EOI> [<ffffffff8108f24e>] ? queued_spin_lock_slowpath+0x10/0x16b
[352155.140013] [<ffffffff8110aab3>] ? dput+0x79/0x1e3
[352155.140013] [<ffffffff810ff241>] ? lookup_real+0x32/0x3b
[352155.140013] [<ffffffff811025df>] ? path_openat+0x610/0xd7d
[352155.140013] [<ffffffff81101850>] ? path_lookupat+0xc6/0x114
[352155.140013] [<ffffffff81102d94>] ? do_filp_open+0x48/0x9e
[352155.140013] [<ffffffff811aa770>] ? security_ptrace_access_check+0x23/0x33
[352155.140013] [<ffffffff81063697>] ? get_task_mm+0x11/0x45
[352155.140013] [<ffffffff811434ad>] ? proc_exe_link+0x34/0x9a
[352155.140013] [<ffffffff811f284a>] ? __list_add+0x12/0x2b
[352155.140013] [<ffffffff8110fe31>] ? __alloc_fd+0x61/0x108
[352155.140013] [<ffffffff810f6a9f>] ? do_sys_open+0x146/0x1cd
[352155.140013] [<ffffffff810f6a9f>] ? do_sys_open+0x146/0x1cd
[352155.140013] [<ffffffff813ed8cb>] ? entry_SYSCALL_64_fastpath+0x12/0x7a
[352335.160005] INFO: rcu_sched self-detected stall on CPU
[352335.160005] 5: (2940276 ticks this GP) idle=053/140000000000001/0 softirq=18046414/18046475 fqs=2900475
[352335.160005] (t=2940564 jiffies g=13801167 c=13801166 q=60311990)
[352335.160005] Task dump for CPU 5:
[352335.160005] syslog-ng R running task 0 2070 1376 0x00000008
[352335.160005] 000000000000001d ffffffff810997ea ffffffff815833c0 ffff880c3fd51bc0
[352335.160005] ffff880094abfc08 ffff880c3fd4ce80 ffffffff8109baf4 00000000039849b6
[352335.160005] ffff88002b55c000 ffffffff815c14e0 ffff88002b55c000 ffff880c3fd43e90
I have a problem here that on a server with a relatively recent grsecurity kernel we got call traces in the dmesg and subsequently the system ran extremely unstable or just crashed.
The dmesg messages looked like the one below (but we got dozends of them).
This was a 4.3.5 kernel with patch
grsecurity-3.1-4.3.5-201602092235.patch
(I haven't tested the latest 4.4 patch yet)
The issue didn't appear reliably, it happened after a while (and on another almost identical system it didn't happen, or maybe just didn't happen yet).
I can only give limited information as it's difficult to debug with the issue not being easily reproducible. But I haven't found any report with a similar stack trace in relation to grsecurity (although I don't 100% know if it's a grsecurity issue, maybe a kernel issue).
[352155.140013] ffff88002b55c000 ffffffff815c14e0 ffff88002b55c000 ffff880c3fd43e90
[352155.140013] Call Trace:
[352155.140013] <IRQ> [<ffffffff810997ea>] ? rcu_dump_cpu_stacks+0x62/0x81
[352155.140013] [<ffffffff8109baf4>] ? rcu_check_callbacks+0x213/0x5c6
[352155.140013] [<ffffffff8109dc0a>] ? update_process_times+0x23/0x45
[352155.140013] [<ffffffff810a94d2>] ? tick_sched_timer+0x2f/0x53
[352155.140013] [<ffffffff8109e082>] ? __hrtimer_run_queues+0x81/0x116
[352155.140013] [<ffffffff810a3ff6>] ? ktime_get_update_offsets_now+0x43/0xd4
[352155.140013] [<ffffffff8109e734>] ? hrtimer_interrupt+0x8e/0x160
[352155.140013] [<ffffffff810273f8>] ? smp_apic_timer_interrupt+0x34/0x43
[352155.140013] [<ffffffff813ee5ee>] ? apic_timer_interrupt+0x8e/0xa0
[352155.140013] <EOI> [<ffffffff8108f24e>] ? queued_spin_lock_slowpath+0x10/0x16b
[352155.140013] [<ffffffff8110aab3>] ? dput+0x79/0x1e3
[352155.140013] [<ffffffff810ff241>] ? lookup_real+0x32/0x3b
[352155.140013] [<ffffffff811025df>] ? path_openat+0x610/0xd7d
[352155.140013] [<ffffffff81101850>] ? path_lookupat+0xc6/0x114
[352155.140013] [<ffffffff81102d94>] ? do_filp_open+0x48/0x9e
[352155.140013] [<ffffffff811aa770>] ? security_ptrace_access_check+0x23/0x33
[352155.140013] [<ffffffff81063697>] ? get_task_mm+0x11/0x45
[352155.140013] [<ffffffff811434ad>] ? proc_exe_link+0x34/0x9a
[352155.140013] [<ffffffff811f284a>] ? __list_add+0x12/0x2b
[352155.140013] [<ffffffff8110fe31>] ? __alloc_fd+0x61/0x108
[352155.140013] [<ffffffff810f6a9f>] ? do_sys_open+0x146/0x1cd
[352155.140013] [<ffffffff810f6a9f>] ? do_sys_open+0x146/0x1cd
[352155.140013] [<ffffffff813ed8cb>] ? entry_SYSCALL_64_fastpath+0x12/0x7a
[352335.160005] INFO: rcu_sched self-detected stall on CPU
[352335.160005] 5: (2940276 ticks this GP) idle=053/140000000000001/0 softirq=18046414/18046475 fqs=2900475
[352335.160005] (t=2940564 jiffies g=13801167 c=13801166 q=60311990)
[352335.160005] Task dump for CPU 5:
[352335.160005] syslog-ng R running task 0 2070 1376 0x00000008
[352335.160005] 000000000000001d ffffffff810997ea ffffffff815833c0 ffff880c3fd51bc0
[352335.160005] ffff880094abfc08 ffff880c3fd4ce80 ffffffff8109baf4 00000000039849b6
[352335.160005] ffff88002b55c000 ffffffff815c14e0 ffff88002b55c000 ffff880c3fd43e90