Cron triggers general deadlock
Posted: Mon Apr 16, 2007 9:02 pm
Hi there,
I've been running Debian sid on a server of mine, and after I upgraded and booted it on a new grsec-patched kernel, it would hang in the morning upon execution of the daily cron. After investigation, I've isolated a culprit script: /etc/cron.daily/find, which is basically a wrapper around updatedb.
At first, when I start the script, everything runs fine for a while. But toward the end of the execution, or even after it has completed, bad things happen. I start having several processes stuck in uninterruptible sleep, subsequently the load increases, when I type commands through ssh, characters are echoing back v.e.r.y. .s.l.o.w.l.y., until the system eventually does not respond anymore.
It affects as well new processes upon spawn, as already running processes (top, sshd), and processes just hanging around without really doing much (ospfd). I have a cron executed every minute, and I end up with several:
Meanwhile, the CPU spends around 80% waiting, and most of the rest for the system. I have to say that this server has too few memory and is used to swap a lot, and may reach these figures, but it usually does it just fine (without crashing). Indeed, it swaps a lot while updatedb is running. But even after it has completed and everything starts going wrong, I still have this CPU usage, without anything left running that should cause intensive swapping. I've tried stracing some processes, but saw nothing of interest. Logs are empty of anything unusual.
I'm afraid to have to say that this problem does not occur on a vanilla kernel. I've been running linux 2.6.18.3 with grsecurity-2.1.9-2.6.18.2-200611100917.patch (I think) without any problem, and it first occured when I upgraded to 2.6.19.3 with grsecurity-2.1.10-2.6.19.3-200702061822.patch (I think), and I've just been investigating it under 2.6.20.7 with grsecurity-2.1.10-2.6.20.6-200704091818.patch (I'm sure).
In case it could be useful, my server is an old Pentium MMX at 233 MHz with 48 MB of RAM, and a SATA drive on a Promise SATA 300 TX2+ controller, a normal / partition and several LVM partitions mounted on it, all in reiserfs. (It also has an IDE drive, whose modules were not even loaded at that time.) As far as I could see, updatedb does not "hang somewhere on a weird file." My .config is available at http://prue.dyn.linkfanel.net/config-2.6.20.7-grsec_andrea
As this server is not on the same continent as me, I am open to limited experimentation only (no, remotely rebooting with "echo b > /proc/sysrq-trigger" before losing all control is not that funny).
I've been running Debian sid on a server of mine, and after I upgraded and booted it on a new grsec-patched kernel, it would hang in the morning upon execution of the daily cron. After investigation, I've isolated a culprit script: /etc/cron.daily/find, which is basically a wrapper around updatedb.
At first, when I start the script, everything runs fine for a while. But toward the end of the execution, or even after it has completed, bad things happen. I start having several processes stuck in uninterruptible sleep, subsequently the load increases, when I type commands through ssh, characters are echoing back v.e.r.y. .s.l.o.w.l.y., until the system eventually does not respond anymore.
It affects as well new processes upon spawn, as already running processes (top, sshd), and processes just hanging around without really doing much (ospfd). I have a cron executed every minute, and I end up with several:
- Code: Select all
S /usr/sbin/cron
D \_ /USR/SBIN/CRON
D \_ /USR/SBIN/CRON
D \_ /USR/SBIN/CRON
...
Meanwhile, the CPU spends around 80% waiting, and most of the rest for the system. I have to say that this server has too few memory and is used to swap a lot, and may reach these figures, but it usually does it just fine (without crashing). Indeed, it swaps a lot while updatedb is running. But even after it has completed and everything starts going wrong, I still have this CPU usage, without anything left running that should cause intensive swapping. I've tried stracing some processes, but saw nothing of interest. Logs are empty of anything unusual.
I'm afraid to have to say that this problem does not occur on a vanilla kernel. I've been running linux 2.6.18.3 with grsecurity-2.1.9-2.6.18.2-200611100917.patch (I think) without any problem, and it first occured when I upgraded to 2.6.19.3 with grsecurity-2.1.10-2.6.19.3-200702061822.patch (I think), and I've just been investigating it under 2.6.20.7 with grsecurity-2.1.10-2.6.20.6-200704091818.patch (I'm sure).
In case it could be useful, my server is an old Pentium MMX at 233 MHz with 48 MB of RAM, and a SATA drive on a Promise SATA 300 TX2+ controller, a normal / partition and several LVM partitions mounted on it, all in reiserfs. (It also has an IDE drive, whose modules were not even loaded at that time.) As far as I could see, updatedb does not "hang somewhere on a weird file." My .config is available at http://prue.dyn.linkfanel.net/config-2.6.20.7-grsec_andrea
As this server is not on the same continent as me, I am open to limited experimentation only (no, remotely rebooting with "echo b > /proc/sysrq-trigger" before losing all control is not that funny).