Call trace / "rcu_sched self-detected stall on CPU" in dmesg

Discuss usability issues, general maintenance, and general support issues for a grsecurity-enabled system.

Call trace / "rcu_sched self-detected stall on CPU" in dmesg

Postby hanno » Mon Feb 22, 2016 6:39 am

Hi,

I have a problem here that on a server with a relatively recent grsecurity kernel we got call traces in the dmesg and subsequently the system ran extremely unstable or just crashed.

The dmesg messages looked like the one below (but we got dozends of them).

This was a 4.3.5 kernel with patch
grsecurity-3.1-4.3.5-201602092235.patch
(I haven't tested the latest 4.4 patch yet)

The issue didn't appear reliably, it happened after a while (and on another almost identical system it didn't happen, or maybe just didn't happen yet).
I can only give limited information as it's difficult to debug with the issue not being easily reproducible. But I haven't found any report with a similar stack trace in relation to grsecurity (although I don't 100% know if it's a grsecurity issue, maybe a kernel issue).


[352155.140013] ffff88002b55c000 ffffffff815c14e0 ffff88002b55c000 ffff880c3fd43e90
[352155.140013] Call Trace:
[352155.140013] <IRQ> [<ffffffff810997ea>] ? rcu_dump_cpu_stacks+0x62/0x81
[352155.140013] [<ffffffff8109baf4>] ? rcu_check_callbacks+0x213/0x5c6
[352155.140013] [<ffffffff8109dc0a>] ? update_process_times+0x23/0x45
[352155.140013] [<ffffffff810a94d2>] ? tick_sched_timer+0x2f/0x53
[352155.140013] [<ffffffff8109e082>] ? __hrtimer_run_queues+0x81/0x116
[352155.140013] [<ffffffff810a3ff6>] ? ktime_get_update_offsets_now+0x43/0xd4
[352155.140013] [<ffffffff8109e734>] ? hrtimer_interrupt+0x8e/0x160
[352155.140013] [<ffffffff810273f8>] ? smp_apic_timer_interrupt+0x34/0x43
[352155.140013] [<ffffffff813ee5ee>] ? apic_timer_interrupt+0x8e/0xa0
[352155.140013] <EOI> [<ffffffff8108f24e>] ? queued_spin_lock_slowpath+0x10/0x16b
[352155.140013] [<ffffffff8110aab3>] ? dput+0x79/0x1e3
[352155.140013] [<ffffffff810ff241>] ? lookup_real+0x32/0x3b
[352155.140013] [<ffffffff811025df>] ? path_openat+0x610/0xd7d
[352155.140013] [<ffffffff81101850>] ? path_lookupat+0xc6/0x114
[352155.140013] [<ffffffff81102d94>] ? do_filp_open+0x48/0x9e
[352155.140013] [<ffffffff811aa770>] ? security_ptrace_access_check+0x23/0x33
[352155.140013] [<ffffffff81063697>] ? get_task_mm+0x11/0x45
[352155.140013] [<ffffffff811434ad>] ? proc_exe_link+0x34/0x9a
[352155.140013] [<ffffffff811f284a>] ? __list_add+0x12/0x2b
[352155.140013] [<ffffffff8110fe31>] ? __alloc_fd+0x61/0x108
[352155.140013] [<ffffffff810f6a9f>] ? do_sys_open+0x146/0x1cd
[352155.140013] [<ffffffff810f6a9f>] ? do_sys_open+0x146/0x1cd
[352155.140013] [<ffffffff813ed8cb>] ? entry_SYSCALL_64_fastpath+0x12/0x7a
[352335.160005] INFO: rcu_sched self-detected stall on CPU
[352335.160005] 5: (2940276 ticks this GP) idle=053/140000000000001/0 softirq=18046414/18046475 fqs=2900475
[352335.160005] (t=2940564 jiffies g=13801167 c=13801166 q=60311990)
[352335.160005] Task dump for CPU 5:
[352335.160005] syslog-ng R running task 0 2070 1376 0x00000008
[352335.160005] 000000000000001d ffffffff810997ea ffffffff815833c0 ffff880c3fd51bc0
[352335.160005] ffff880094abfc08 ffff880c3fd4ce80 ffffffff8109baf4 00000000039849b6
[352335.160005] ffff88002b55c000 ffffffff815c14e0 ffff88002b55c000 ffff880c3fd43e90
hanno
 
Posts: 26
Joined: Thu Dec 16, 2004 4:37 am

Re: Call trace / "rcu_sched self-detected stall on CPU" in d

Postby PaX Team » Tue Feb 23, 2016 1:36 pm

a couple of things:

1. try to reproduce it with a vanilla kernel if you haven't already (preferably 4.4 as we already moved to it)
2. enable frame pointers in your .config for a better backtrace
3. try sysrq-l to get a backtrace of each CPU
4. it seems that one of the CPUs is trying to get an already taken spinlock, perhaps try to enable lock debugging and see if it detects anything
5. are there any previous kernel logs that may indicate a problem?
PaX Team
 
Posts: 2310
Joined: Mon Mar 18, 2002 4:35 pm

Re: Call trace / "rcu_sched self-detected stall on CPU" in d

Postby hanno » Fri Feb 26, 2016 6:06 pm

I have updated the kernel to 4.4.2 + latest grsec patch (and did what you recommended - frame pointers etc.).
Until now the issue didn't show up again, so I hope it is somehow fixed by the new kernel or the new patch.
hanno
 
Posts: 26
Joined: Thu Dec 16, 2004 4:37 am


Return to grsecurity support