Random hard lockups under cpu load

Discuss usability issues, general maintenance, and general support issues for a grsecurity-enabled system.

Random hard lockups under cpu load

Postby ramereth » Tue Nov 01, 2011 11:51 am

I've been having an ongoing problem since around 2.6.32 where some of our machines that are under a high cpu load get into a hard locked state. We get no output anywhere and its been extremely difficult to get any sort of output from the kernel. So far we've figured out that it seems to only happen on our grsec kernels and we're currently running 2.6.39-hardened-r8 on Gentoo Hardened. Are there any known potential issues where grsec will hard lock a machine under high cpu load? What's the best method to getting debug output on a hardened kernel? I'm at a loss as to what do to next.

Thanks-
ramereth
 
Posts: 2
Joined: Tue Nov 01, 2011 11:35 am

Re: Random hard lockups under cpu load

Postby PaX Team » Thu Nov 03, 2011 6:57 am

ramereth wrote:I've been having an ongoing problem since around 2.6.32 where some of our machines that are under a high cpu load get into a hard locked state. We get no output anywhere and its been extremely difficult to get any sort of output from the kernel. So far we've figured out that it seems to only happen on our grsec kernels and we're currently running 2.6.39-hardened-r8 on Gentoo Hardened. Are there any known potential issues where grsec will hard lock a machine under high cpu load? What's the best method to getting debug output on a hardened kernel? I'm at a loss as to what do to next.
if these lockups are more or less reproducible, can you try one of the currently supported grsec patches (2.6.32 or 3.0, soon 3.1, instead of .39)? second, what is your .config (in particular, is it 32/64 bit x86? what PaX features are on, etc)? as for debugging, you should probably enable some lock debugging options (if your workload can tolerate the associated performance impact) and the NMI watchdog. also logging through netconsole may be able to catch the resulting logs but if you can attach a monitor and take a shot that'll do too.
PaX Team
 
Posts: 2310
Joined: Mon Mar 18, 2002 4:35 pm

Re: Random hard lockups under cpu load

Postby ramereth » Fri Nov 04, 2011 12:34 pm

PaX Team wrote:if these lockups are more or less reproducible, can you try one of the currently supported grsec patches (2.6.32 or 3.0, soon 3.1, instead of .39)? second, what is your .config (in particular, is it 32/64 bit x86? what PaX features are on, etc)? as for debugging, you should probably enable some lock debugging options (if your workload can tolerate the associated performance impact) and the NMI watchdog. also logging through netconsole may be able to catch the resulting logs but if you can attach a monitor and take a shot that'll do too.


They are fairly reproducible when I do several concurrent stage4 builds (i.e. using chroots extensively) on 64bit. I've been able to narrow down the issue only tripping when I have some of the chroot grsec features enabled. Here's the list of features I have enabled:

Code: Select all
kernel.grsecurity.chroot_deny_shmat = 1
kernel.grsecurity.chroot_deny_unix = 1
kernel.grsecurity.chroot_deny_mount = 1
kernel.grsecurity.chroot_deny_fchdir = 1
kernel.grsecurity.chroot_deny_chroot = 1
kernel.grsecurity.chroot_deny_pivot = 1
kernel.grsecurity.chroot_enforce_chdir = 1
kernel.grsecurity.chroot_deny_mknod = 1
kernel.grsecurity.chroot_restrict_nice = 1
kernel.grsecurity.chroot_deny_sysctl = 1
kernel.grsecurity.chroot_findtask = 1


I have had no luck getting output from netconsole when the machine locks up. I'm going to try and narrow down which chroot feature is actually causing it. I'll also try 3.0.x and get you the kernel config soon. If chroot is the culprit it would align with the hosts that appear to have this problem. Our ftp server uses chroot for the rsyncd config so I suspect some syncs may be triggering it. I haven't had this happen on many other hosts other than those that use chroots and may have high load sometimes.
ramereth
 
Posts: 2
Joined: Tue Nov 01, 2011 11:35 am

Re: Random hard lockups under cpu load

Postby spender » Sun Dec 04, 2011 7:18 pm

Hi sir,

I believe this problem to be fixed in the latest 3.1.4 patch. It had to do with a lock being acquired by the chroot fchdir code while the kernel was performing an RCU-walk. I believe we had a workaround for you that you've been using, but just wanted to update that the feature is safe to use again.

Thanks,
-Brad
spender
 
Posts: 2185
Joined: Wed Feb 20, 2002 8:00 pm


Return to grsecurity support