grsecurity forums

Posted: **Mon Oct 18, 2010 2:05 pm**

Hi,

Did anyone saw a problem like this? When grsec patch is applied then on some systems
various processed end stuck (see below). Usually it's ktime_get_ts is on top of trace.
vanilla kernel seem to be working fine.

[13320.258868] INFO: task rpcbind:14163 blocked for more than 120 seconds.
[13320.258874] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[13320.258888] rpcbind D 00000000ffffffff 0 14163 2127 0x00000000
[13320.258894] ffff880041e87cc8 0000000000000086 ffff880041e87d28
ffff880041e87c78
[13320.258899] ffff880001886980 0000000000011140 0000000000011140
0000000000011140
[13320.258903] ffff880041e87fd8 ffff880041e87fd8 ffff880078fbd580
0000000000011140
[13320.258907] Call Trace:
[13320.258920] [<ffffffff810833e9>] ? ktime_get_ts+0xa9/0xe0
[13320.258928] [<ffffffff810ee9c0>] ? sync_page+0x0/0x50
[13320.258935] [<ffffffff8143240e>] io_schedule+0x6e/0xb0
[13320.258938] [<ffffffff810ee9f8>] sync_page+0x38/0x50
[13320.258943] [<ffffffff81432b92>] __wait_on_bit_lock+0x52/0xb0
[13320.258946] [<ffffffff810ee9a2>] __lock_page+0x62/0x70
[13320.258952] [<ffffffff81069ca0>] ? wake_bit_function+0x0/0x40
[13320.258957] [<ffffffff8110c302>] handle_mm_fault+0xb62/0xba0
[13320.258963] [<ffffffff81437f25>] do_page_fault+0x145/0x440
[13320.258968] [<ffffffff8103cf8d>] ? finish_task_switch+0x3d/0xc0
[13320.258972] [<ffffffff81431e83>] ? schedule+0x2c3/0x7e0
[13320.258975] [<ffffffff81434cd1>] ? restore_args+0x0/0x30
[13320.258979] [<ffffffff81434ec4>] page_fault+0x24/0x30

Posted: **Tue Oct 19, 2010 7:54 am**

grsecurity-2.2.0-2.6.35.5-201009241805.patch seems to be fine (applied here to 2.6.35.6)
grsecurity-2.2.0-2.6.35.6-201009281623.patch is problematic (applied here to 2.6.35.7)
grsecurity-2.2.0-2.6.35.7-201010121028.patch also problematic (applied here to 2.6.35.7)

Posted: **Tue Oct 19, 2010 8:14 am**

Are you sure this is a grsecurity issue and not a 2.6.35.7 regression? I ask because there are a number of posts to be found about this problem on vanilla kernels. See:

https://bugzilla.redhat.com/show_bug.cgi?id=576749
http://comments.gmane.org/gmane.linux.r ... ral/370542
http://lkml.org/lkml/2010/10/15/264

-Brad

Posted: **Tue Oct 19, 2010 9:11 am**

Problem happens on dell vostro notebook where no mdadm or any other raid was used - two first links describe raid related problems.

Third link (mail by Pawel Sikora) is the case I'm describing - he is using the same kernels as I am. The problem happens for more people using the same kernel (4 or 5 people so far).

First we thought that the problem is generic but we weren't able to reproduce on vanilla. We did tests on our (heavily) patched kernel with and without grsecurity. Only grsecurity version causes problems.

To be sure that the source or the problem isn't any weird other-patches vs grsecurity patch interaction we are going to test vanilla+grsec today.

Were there any changes between two mentioned grsecurity patches that could cause this?

Posted: **Fri Oct 22, 2010 3:48 am**

Pawel asked me to post this here. He was testing vanilla + grsec patch and was able to reproduce the problem. Unfortunately no serial console so only photos of oopses here:

http://pluto.agmk.net/kernel/

No softraid on this machine, so the previously mentioned url/bug reports about mdadm problem don't apply here.

Posted: **Fri Oct 22, 2010 5:09 am**

hi,

i'd like to say that i don't have a deterministic testcase to fire mentioned rpcbind task blocking.
it just happens during normal worktime activity (compiling, walking through automonted nfs shares, etc)
and memtest doesn't report any errors on both machines.

it happend yesterday during nfs services restart:
http://pluto.agmk.net/kernel/accidental ... estart.txt

and today's morning during machine shutdown (on smartd service shutdown):
http://pluto.agmk.net/kernel/*.jpg

finally, it could be a grsec bug or a vanilla bug exposed by grsec.
i'm not a kernel hacker to prove it from stacktraces

Posted: **Fri Oct 22, 2010 11:17 pm**

I looked at the interdiff between the patch you said was working and the patch that wasn't working. There's nothing in there that would cause the problems you're exhibiting.

The changes between the two patches were: aio range checking (fixed upstream later), the correct Xen fix (which ended up in 2.6.35.7), some minor refcount changes, a compat mode bugfix (fixed upstream later), and some memsets for infoleaks in various drivers (fixed upstream later).

http://www.gossamer-threads.com/lists/l ... el/1290016
confirms it's not caused by grsecurity/PaX

-Brad

Posted: **Fri Nov 05, 2010 6:36 pm**

Hello,

I've got a similar problem:

Code: Select all: [ 9120.104058] INFO: task rdnssd:2712 blocked for more than 120 seconds. [ 9120.104064] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 9120.104067] rdnssd D c248acd0 0 2712 1123 0x00000000 [ 9120.104072] dcfc06c0 00000086 df402e00 c248acd0 00000002 00115503 c21313a0 00115526 [ 9120.104079] dcfc06c0 defede90 c248acd0 00000002 0024fe52 defede88 0006644d 00250426 [ 9120.104085] 00000000 00066420 c1c2bca4 00000000 523cff34 dccd4700 00066408 00000002 [ 9120.104091] Call Trace: [ 9120.104103] [<00115503>] ? __generic_unplug_device+0x23/0x30 [ 9120.104107] [<00115526>] ? generic_unplug_device+0x16/0x20 [ 9120.104112] [<0024fe52>] ? io_schedule+0x22/0x40 [ 9120.104116] [<0006644d>] ? sync_page+0x2d/0x40 [ 9120.104120] [<00250426>] ? __wait_on_bit_lock+0x46/0x90 [ 9120.104123] [<00066420>] ? sync_page+0x0/0x40 [ 9120.104126] [<00066408>] ? __lock_page+0xa8/0xb0 [ 9120.104132] [<0003fa10>] ? wake_bit_function+0x0/0x60 [ 9120.104137] [<0007a63c>] ? handle_mm_fault+0x8dc/0x8f0 [ 9120.104144] [<000f2800>] ? ext3_group_add+0x10a0/0x1750 [ 9120.104149] [<0001fc9f>] ? do_page_fault+0x13f/0x400 [ 9120.104153] [<0001fb60>] ? do_page_fault+0x0/0x400 [ 9120.104157] [<00251830>] ? error_code+0x8c/0x94 [ 9120.104162] [<00010202>] ? intel_pmu_enable_all+0x82/0xd0 [ 9120.104166] [<002513fa>] ? restore_all+0x0/0x18

It happens overnight with grsec, but hasn't happened in a week on a vanilla kernel. It seems always to start with the same process: rdnssd doing a regular fork()+exec(), but the child getting stuck apparently before the exec() call completes. Then other processes like ps start getting randomly stuck too.

If it's not caused by grsec, then it sure is exposed by it :/

grsecurity forums

2.6.35.7 and grsecurity-2.2.0-2.6.35.7-201010121028.patch

2.6.35.7 and grsecurity-2.2.0-2.6.35.7-201010121028.patch

Re: 2.6.35.7 and grsecurity-2.2.0-2.6.35.7-201010121028.patc

Re: 2.6.35.7 and grsecurity-2.2.0-2.6.35.7-201010121028.patc

Re: 2.6.35.7 and grsecurity-2.2.0-2.6.35.7-201010121028.patc

Re: 2.6.35.7 and grsecurity-2.2.0-2.6.35.7-201010121028.patc

Re: 2.6.35.7 and grsecurity-2.2.0-2.6.35.7-201010121028.patc

Re: 2.6.35.7 and grsecurity-2.2.0-2.6.35.7-201010121028.patc

Re: 2.6.35.7 and grsecurity-2.2.0-2.6.35.7-201010121028.patc