grsecurity forums

by **closet geek** » Wed Sep 07, 2011 11:54 am

Hi,

We have just had a server with 200+ days of uptime restart due to this:

kernel: [17580054.226739] PAX: refcount overflow detected in: kblockd/6:287, uid/euid: 0/0
kernel: [17580054.227190] CPU 6

we've not seen this error before and wanted to understand more about it. We are running kernel 2.6.35.4 with grsec/pax: 201009172030

Our first port of call is to do a full fsck on the filesystems and upgrade to the latest stable kernel + grsec. Is this a known bug that is fixed in Pax in the latest stable patch?

What causes this problem? Is it a sign of hardware problems or is it a pure Pax/grsec/kernel bug?

Thanks.

by **specs** » Wed Sep 07, 2011 12:46 pm

Perhaps you should try to use the search function of the forum ;-)

Especially since you use an old unsupported kernelpatch.

viewtopic.php?f=3&t=2705&p=10966&hilit=refcount#p10966

by **closet geek** » Wed Sep 07, 2011 12:48 pm

Hi,

I did search but I'm interested in specifically understanding the nature of the problem with respect to kblockd. It'd be particularly reassuring to hear that this was a known bug with Pax in older versions which I know is now fixed in the newer kernel/grsec/pax we are using.

Thanks.

by **specs** » Wed Sep 07, 2011 5:43 pm

I don't know if the refcount problem for the kblock function has been fixed.

Refcount problems are mentioned a lot in the changelog, but not the functions which are patched (http://grsecurity.net/changelog-test.txt).
You'd be comparing the old kernel with a recent kernel or you'd need to find several kernelpatches and extract interdiffs to find out.

by **PaX Team** » Thu Sep 08, 2011 8:36 am

closet geek wrote:I did search but I'm interested in specifically understanding the nature of the problem with respect to kblockd. It'd be particularly reassuring to hear that this was a known bug with Pax in older versions which I know is now fixed in the newer kernel/grsec/pax we are using.

just as i wrote in the above linked thread, i'd need the vmlinux image corresponding to the dmesg you showed (and i'll need the full message, especially the backtrace and the faulting EIP/RIP value). as for kblockd being the culprit, it's more likely that it just happened to execute the particular atomic op that triggered the protection, it could have been any other user of that particular atomic variable. but i can tell you more only if you give me more information

.

by **closet geek** » Thu Sep 08, 2011 8:53 am

PaX Team wrote:just as i wrote in the above linked thread, i'd need the vmlinux image corresponding to the dmesg you showed (and i'll need the full message, especially the backtrace and the faulting EIP/RIP value). as for kblockd being the culprit, it's more likely that it just happened to execute the particular atomic op that triggered the protection, it could have been any other user of that particular atomic variable. but i can tell you more only if you give me more information .

There is nothing more in the logs files than that I'm afraid, the server logged what I posted then rebooted itself...

Can you explain the process for me a little though? Pax has detected a refcount overflow (what is this?) in kblockd - or there is a general refcount counter somewhere that kblockd happened to tip over it's limit and that caused Pax to kill kblockd and the server therefore rebooted? I just trying to get my head around the order of events.

My main concern is what might trigger such an event e.g. someone trying something malicious or a hardware fault of some sort. Perhaps it is nothing as sinister as this.

Thanks for all your help, as you can tell this is a bit above my level but I am trying!

by **spender** » Thu Sep 08, 2011 11:54 am

Reference counters are often used to keep track of how many references there are to a certain kernel object. When the reference count goes to zero, then the kernel knows that it can free the associated object. Kernel objects of this kind are "acquired" via a _get() call and "released" via a _put() call. When the number of calls to acquire does not match the releases for the object, then there's an error in the reference counting. A common mistake is to forget to release the reference count in all applicable error paths -- in some instances of this, it can be possible to continually increment a reference counter (the count increases with get() and decreases with put()) that will cause the integer value holding the counter to eventually wrap around to 0. This will cause the associated kernel object to be freed -- and yet there is still likely some code that is using the object. This is the kind of situation that makes a reference counter overflow equivalent to the well-known "use-after-free" bug class.

The kernel uses the atomic_t type generally for reference counters, but also in some cases for usage counters for statistical purposes. PAX_REFCOUNT prevents the wraparound from occurring by detecting when an atomic_t value goes from (as a signed value) positive to negative. PaX attempts to remove the false positive case for usage counters by separating the use of atomic_t into checked and unchecked types -- checked for real reference counter use, and unchecked for the usage counters. When the usage counters reach high levels (which often happens slowly, as a common case is to keep track of error counts) -- it can trigger the same mechanism in PAX_REFCOUNT if it has not yet been identified as the usage counter case.

So that hopefully explains what reference counters are, what reference counter overflows are, what PAX_REFCOUNT is, how it works, where the false positives come from, and how PaX addresses them.

In your case, given the process the overflow occurred in, the age of the kernel, the uptime of the machine, and the fact that numerous false positives were fixed in filesystem code, I'd wager it was one of the fixed false positives, but without the full log we can't know for sure.

-Brad

by **closet geek** » Thu Sep 08, 2011 1:41 pm

Brad: Thank you so much for the detailed reply, it's genuinely appreciated. It all makes perfect sense to me now!

grsecurity forums

PAX: refcount overflow

PAX: refcount overflow

Re: PAX: refcount overflow

Re: PAX: refcount overflow

Re: PAX: refcount overflow

Re: PAX: refcount overflow

Re: PAX: refcount overflow

Re: PAX: refcount overflow

Re: PAX: refcount overflow