Possible grsecurity deadlock

Discuss usability issues, general maintenance, and general support issues for a grsecurity-enabled system.

Possible grsecurity deadlock

Postby nelhage » Tue Jan 22, 2008 10:11 pm

Hello,

We've been having problems with deadlocks every few months on a
popular (~100 concurrent users) ssh server running grsecurity over the
last year and a half (Linux kernel versions increasing between 2.6.16
and 2.6.23). Recently, we recompiled our kernel with spinlock
debugging and set up a serial console to the machine and determined
that 3 of the processors were in spinlock lockup, and the fourth was
not responding. Investigation of the code suggests that the lock all
three processors were spinning on was the sighand->siglock lock of
some processes' signal handler struct. The complete backtraces that we
have from these CPUs is below.

A brief search of the grsecurity patch suggests that grsecurity has
some large blocks of code involved in logging signals while holding
the siglock (in particular, force_sig_info calls gr_log_signal, which
generates and potentially prints log messages, with the siglock held).
Our current best guess is that some race condition or rarely exercised
code path in this logging code is causing these deadlocks.

These deadlocks are typically immediately preceeded by waves of
grsecurity segfault notifications to the system logs. (We are aware
that these segfaults are not generated by grsecurity).

We are not entirely certain that grsecurity is at fault here, because
we are also running an openafs kernel module on the machine, but AFS
only has a small number of ~6-line blocks of code holding the siglock,
and they all seem to be safe. Grsecurity thus seems more likely to be
responsible.

Let us know if any other information would be useful.

[I sent this message to grsecurity@grsecurity.net about a month ago; We received no reply, and the archives at http://grsecurity.net/pipermail/grsecurity/ trail off in October, so we're not sure if that list is still active]

Code: Select all

[Fri Nov 23 16:09:17 2007]BUG: spinlock lockup on CPU#1, fileziOSyb/28366, ffff81019fbdd108
[Fri Nov 23 16:09:17 2007]
[Fri Nov 23 16:09:17 2007]Call Trace:
[Fri Nov 23 16:09:17 2007] [<ffffffff803458c7>] _raw_spin_lock+0x117/0x150
[Fri Nov 23 16:09:17 2007] [<ffffffff8052f610>] _spin_lock+0x50/0x70
[Fri Nov 23 16:09:17 2007] [<ffffffff802314a1>] release_task+0x91/0x340
[Fri Nov 23 16:09:17 2007] [<ffffffff80232d2e>] do_exit+0x84e/0x990
[Fri Nov 23 16:09:17 2007] [<ffffffff8052daa9>] mutex_unlock+0x9/0x10
[Fri Nov 23 16:09:17 2007] [<ffffffff80232f33>] sys_exit+0x13/0x20
[Fri Nov 23 16:09:17 2007] [<ffffffff8021d472>] ia32_sysret+0x0/0xa
[Fri Nov 23 16:09:17 2007]
[Fri Nov 23 16:09:17 2007]BUG: spinlock lockup on CPU#0, fileziOSyb/28359, ffff81019fbdd108
[Fri Nov 23 16:09:17 2007]
[Fri Nov 23 16:09:17 2007]Call Trace:
[Fri Nov 23 16:09:17 2007] [<ffffffff803458c7>] _raw_spin_lock+0x117/0x150
[Fri Nov 23 16:09:17 2007] [<ffffffff8052f9f1>] _spin_lock_irq+0x51/0x70
[Fri Nov 23 16:09:17 2007] [<ffffffff802392a8>] sigprocmask+0x38/0xf0
[Fri Nov 23 16:09:17 2007] [<ffffffff8023c31e>] sys_rt_sigprocmask+0x7e/0x100
[Fri Nov 23 16:09:17 2007] [<ffffffff8021e107>] sys32_rt_sigprocmask+0x77/0x110
[Fri Nov 23 16:09:17 2007] [<ffffffff8021deb6>] sys32_mmap2+0x76/0xf0
[Fri Nov 23 16:09:17 2007] [<ffffffff8021d472>] ia32_sysret+0x0/0xa
[Fri Nov 23 16:09:17 2007]
[Fri Nov 23 16:10:53 2007]BUG: spinlock lockup on CPU#3, fileziOSyb/2959, ffff81019fbdd108
[Fri Nov 23 16:10:53 2007]
[Fri Nov 23 16:10:53 2007]Call Trace:
[Fri Nov 23 16:10:53 2007] [<ffffffff803458c7>] _raw_spin_lock+0x117/0x150
[Fri Nov 23 16:10:53 2007] [<ffffffff8052f9f1>] _spin_lock_irq+0x51/0x70
[Fri Nov 23 16:10:53 2007] [<ffffffff802392a8>] sigprocmask+0x38/0xf0
[Fri Nov 23 16:10:53 2007] [<ffffffff8023c31e>] sys_rt_sigprocmask+0x7e/0x100
[Fri Nov 23 16:10:53 2007] [<ffffffff80251a91>] compat_sys_futex+0x71/0x110
[Fri Nov 23 16:10:53 2007] [<ffffffff8021e107>] sys32_rt_sigprocmask+0x77/0x110
[Fri Nov 23 16:10:53 2007] [<ffffffff8021d472>] ia32_sysret+0x0/0xa
[Fri Nov 23 16:10:53 2007]
nelhage
 
Posts: 1
Joined: Tue Jan 22, 2008 10:02 pm

Re: Possible grsecurity deadlock

Postby PaX Team » Wed Jan 23, 2008 8:28 am

nelhage wrote:A brief search of the grsecurity patch suggests that grsecurity has
some large blocks of code involved in logging signals while holding
the siglock (in particular, force_sig_info calls gr_log_signal, which
generates and potentially prints log messages, with the siglock held).
Our current best guess is that some race condition or rarely exercised
code path in this logging code is causing these deadlocks.
thanks for the report, a quick look suggest that there's actually an AB-BA type lock inversion with the tasklist_lock, and maybe others as well. i think the whole log generation code needs a reorganization so that a lot less work is done with interrupts disabled.
[I sent this message to grsecurity@grsecurity.net about a month ago; We received no reply, and the archives at http://grsecurity.net/pipermail/grsecurity/ trail off in October, so we're not sure if that list is still active]
hmm true (shows how much we use the forum vs. the list), we'll fix it.
PaX Team
 
Posts: 2310
Joined: Mon Mar 18, 2002 4:35 pm

Re: Possible grsecurity deadlock

Postby spender » Wed Jan 23, 2008 5:52 pm

Thanks for the report. I've uploaded new patches for 2.4 and 2.6 that should resolve this problem. Let me know if you still experience the deadlock. I've also restarted the mailman service (it seems to have crashed some time in December).

-Brad
spender
 
Posts: 2185
Joined: Wed Feb 20, 2002 8:00 pm


Return to grsecurity support

cron