PaX Team wrote:if these lockups are more or less reproducible, can you try one of the currently supported grsec patches (2.6.32 or 3.0, soon 3.1, instead of .39)? second, what is your .config (in particular, is it 32/64 bit x86? what PaX features are on, etc)? as for debugging, you should probably enable some lock debugging options (if your workload can tolerate the associated performance impact) and the NMI watchdog. also logging through netconsole may be able to catch the resulting logs but if you can attach a monitor and take a shot that'll do too.
They are fairly reproducible when I do several concurrent stage4 builds (i.e. using chroots extensively) on 64bit. I've been able to narrow down the issue only tripping when I have some of the chroot grsec features enabled. Here's the list of features I have enabled:
- Code: Select all
kernel.grsecurity.chroot_deny_shmat = 1
kernel.grsecurity.chroot_deny_unix = 1
kernel.grsecurity.chroot_deny_mount = 1
kernel.grsecurity.chroot_deny_fchdir = 1
kernel.grsecurity.chroot_deny_chroot = 1
kernel.grsecurity.chroot_deny_pivot = 1
kernel.grsecurity.chroot_enforce_chdir = 1
kernel.grsecurity.chroot_deny_mknod = 1
kernel.grsecurity.chroot_restrict_nice = 1
kernel.grsecurity.chroot_deny_sysctl = 1
kernel.grsecurity.chroot_findtask = 1
I have had no luck getting output from netconsole when the machine locks up. I'm going to try and narrow down which chroot feature is actually causing it. I'll also try 3.0.x and get you the kernel config soon. If chroot is the culprit it would align with the hosts that appear to have this problem. Our ftp server uses chroot for the rsyncd config so I suspect some syncs may be triggering it. I haven't had this happen on many other hosts other than those that use chroots and may have high load sometimes.