[BUG] grsecurity 2.0.2 Stability problem

Discuss usability issues, general maintenance, and general support issues for a grsecurity-enabled system.

[BUG] grsecurity 2.0.2 Stability problem

Postby TTgrsec » Mon Nov 22, 2004 7:00 pm

We upgraded our servers to Linux 2.4.28 + grsecurity 2.0.2 and got bit by a rather nasty bug which forced us to revert to 2.4.27 + 2.0.1.

It seems there are a few problems:
1) RLIMITs are not being enforced
2) log message DOS protection not working
3) Interrupts are being masked when grsecurity is logging
4) Clock skew with grsecurity log flood control

In the case of #1, that is very easy to illustrate with the following spinme.sh script:
Code: Select all
#!/bin/sh
ulimit -t 5
while true; do :; done


From the logfiles on that server:
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 6 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:24892] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 6 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:24892] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 6 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:24892] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 6 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:24892] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 6 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:24892] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 6 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:24892] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 6 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:24892] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 6 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:24892] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 6 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:24892] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
......
......
......
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 339 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:11999] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 339 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:11999] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 339 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:11999] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 339 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:11999] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 339 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:11999] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 339 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:11999] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 339 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:11999] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 339 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:11999] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 339 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:11999] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 339 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:11999] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 339 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:11999] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 339 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:11999] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 339 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:11999] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210
ttyR11.log.1.gz:grsec: From 68.202.145.28: attempted resource overstep by requesting 339 for RLIMIT_CPU against limit 5 by /home/fqbuild/work/spinme.sh[spinme.sh:11999] uid/euid:210/210 gid/egid:210/210, parent /bin/bash[bash:3007] uid/euid:210/210 gid/egid:210/210


Notice how the process ran for 339 seconds. It should have been killed at 6 seconds.

Which brings us to the next issue. When grsecurity was dutifully logging these resource oversteps, it is clear to see that the log flooding controls did not work. Ours is set to:
FLOODTIME: 10
FLOODBURST: 16

Also, when the logging was happening, interrupts appeared to be masked as the server's network card was no longer responsive. From the outside, it appeared that the server was dead, however through the serial console, the server was very much alive and caught in a very tight loop.

The only way to recover the server was to remotely pull its power and bring it back online using the older Linux 2.4.27 + grsecurity 2.0.1 combination.

The above error can be reproduced with that simple spinme.sh script, and also we had five of our servers appear to lockup before we had a chance to revert to the older kernel set.

One more mention, it appears that when grsecurity is handing message flooding, our server clocks become skewed, usually anywhere from 500ms to 1 second. With resource logging turned on, our clocks would become terribly skewed if it were not for us using a modified DJB clockspeed with aggressive adjtimex settings to counteract grsecurity skew.

If you need any additional details, please let us know.

--
Terra
TTgrsec
 
Posts: 9
Joined: Tue May 04, 2004 12:00 pm

Postby spender » Tue Nov 23, 2004 9:35 am

As for the first problem, I haven't changed the RLIMIT code at all, and kernel/timer.c (the rlimit in question here) hasn't changed either. There's always been SIGXCPU sent to the process when the soft limit is reached and SIGKILL when the hard limit is reached. From my test here:

spender@gw:~$ time sh ./spinme.sh
Killed

real 0m5.982s
user 0m5.990s
sys 0m0.010s

For the second problem, there was indeed a problem with the flood control that has been fixed in CVS and in grsecurity 2.0.2 on the website (since it was only a one line fix).

When grsec logs, it holds two read locks and one spinlock. It itself doesn't mask interrupts, though printk() does, so there's no way around this. Problem 4 may be related to this as well.

Hopefully fixing problem two should resolve the other side-effects you noticed.

-Brad
spender
 
Posts: 2185
Joined: Wed Feb 20, 2002 8:00 pm

Postby TTgrsec » Tue Nov 23, 2004 11:05 am

Hello Brad,

Thank you for the followup.

The question still begs to be asked, what would have allowed that process allowed to execute beyond its CPU limit? Perhaps there is a deeper issue than meets the eye and only occurs when the kernel is configured a certain way. If you would like, I can privately send you our kernel config file.

The event happened on busy servers, and it may take a few runs on that script to get the error to occur. For me, it took 5 tries before I was able to cause the server to go belly up, running the script as non root user. FWIW, all servers were using a SMP kernel.

Linux 2.4.27 + grsecurity 2.0.1 works perfectly well, however Linux 2.4.28 + grsecurity 2.0.2 tanked pretty hard on us.

--
Terra
TTgrsec
 
Posts: 9
Joined: Tue May 04, 2004 12:00 pm

Postby spender » Tue Nov 23, 2004 11:27 am

I have an SMP machine here as well, and the script is running consistently here for me (20 runs or so). BTW if you're depending on a soft-limit cpu time restriction, any process can bypass it by handling the SIGXCPU signal. You would need to set the hard-limit as well to ensure that the process is terminated. Of course if you're using this just to stop run-away processes that wouldn't be modified in such a way, then it's not an issue.

As for what could have caused it: no idea, since the code involved in kernel/timer.c hasn't changed.

-Brad
spender
 
Posts: 2185
Joined: Wed Feb 20, 2002 8:00 pm

Postby TTgrsec » Tue Nov 23, 2004 1:37 pm

Hello Brad,

I have confirmed that the process is being properly killed now with the updated grsecurity-2.0.2-2.4.28.patch (posted 11/23/2004)

If you remove that '} else return flooding;' statement though, the server will spin as shown above.

For clarity sake, you may want to update the website and change 11/20/2004 ==> 11/23/2004

Possibly even better is to place a patch revision as part of the filename, e.g.:
grsecurity-2.0.2-2.4.28-p1.patch
or
grsecurity-2.0.2_p1-2.4.28.patch

That way at a glance, we can easily see if the file was an incremental patch release since the notion of using '11/23/2004' doesn't always help... Some programs that download files don't always preserve the source's timestamp. Just something to consider.

Also, we use both soft and hard limits depending on if we want to allow certain processes to override the limit.

Thank you for putting the patched version online and hopefully this glitch won't rear its ugly head again. It was an eventful weekend watching multiple servers melt down while we were working to revert to prior kernel.

--
Terra
TTgrsec
 
Posts: 9
Joined: Tue May 04, 2004 12:00 pm


Return to grsecurity support

cron