grsecurity forums

by **mutemule** » Wed Oct 07, 2015 10:26 am

I've been working on extending some of our grsecurity deployments, and I've run into an issue with the latest patchset for 3.14.54. This previously working on a 3.14.51-based kernel, with a grsecurity patchset from August 23/24, but we're seeing segfaults in nginx with the latest patchset. Our base OS is Ubuntu 14.04, and the kernel is built on Ubuntu 14.10 so we can get some updated GCC plugins.

To test, I've been placing a webserver with the kernel in production, and monitoring performance and behaviours. With the new kernel, we'll see segfaults somewhat regularly, although it's hard to predict how quickly they happen; sometimes it's within minutes, sometimes it takes a half hour:

Code: Select all: Oct 7 13:20:32 webtest kernel: [66581.193615] grsec: From 192.168.0.10: Segmentation fault occurred at (nil) in /usr/sbin/nginx[nginx:8166] uid/euid:33/33 gid/egid:33/33, parent /usr/sbin/nginx[nginx:8164] uid/euid:0/0 gid/egid:0/0

Since we use Lua, nginx is already exempt from mprotect. To try to figure out what might be causing it, I've exempted it from all things PAX; its flags are now "mprxs". This didn't help, and the segmentation faults are continuing.

Taking a quick look at a core dump:

Code: Select all: Reading symbols from /usr/sbin/nginx...done. [New LWP 8166] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `nginx: worker process '. Program terminated with signal SIGSEGV, Segmentation fault. #0 ngx_ssl_new_session (ssl_conn=0x2ea1f00, sess=0x2d779e0) at src/event/ngx_event_openssl.c:2309 2309 cache = shm_zone->data; (gdb) bt #0 ngx_ssl_new_session (ssl_conn=0x2ea1f00, sess=0x2d779e0) at src/event/ngx_event_openssl.c:2309 #1 0x00006de9a1d33790 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.0.0 #2 0x00006de9a1d12908 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.0.0 #3 0x00000000004284f1 in ngx_ssl_handshake (c=c@entry=0x6de99b5f31d0) at src/event/ngx_event_openssl.c:1093 #4 0x0000000000428738 in ngx_ssl_handshake_handler (ev=<optimized out>) at src/event/ngx_event_openssl.c:1245 #5 0x000000000041ea58 in ngx_event_process_posted (cycle=cycle@entry=0x872ea0, posted=0x6b54f0 <ngx_posted_events>) at src/event/ngx_event_posted.c:33 #6 0x000000000041e645 in ngx_process_events_and_timers (cycle=cycle@entry=0x872ea0) at src/event/ngx_event.c:265 #7 0x00000000004240cc in ngx_worker_process_cycle (cycle=0x872ea0, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:767 #8 0x0000000000422b46 in ngx_spawn_process (cycle=cycle@entry=0x872ea0, proc=proc@entry=0x423ffb <ngx_worker_process_cycle>, data=data@entry=0x1, name=name@entry=0x481386 "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:198 #9 0x0000000000424225 in ngx_start_worker_processes (cycle=cycle@entry=0x872ea0, n=2, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:357 #10 0x0000000000424a81 in ngx_master_process_cycle (cycle=cycle@entry=0x872ea0) at src/os/unix/ngx_process_cycle.c:129 #11 0x00000000004081c6 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:419 (gdb)

Note that I've only looked at the one core, and I haven't really done any further investigation.

Is there something that's changed in recent versions of grsecurity that might cause this?

by **PaX Team** » Wed Oct 07, 2015 12:21 pm

it's a NULL deref in nginx/src/event/ngx_event_openssl.c:ngx_ssl_new_session() where SSL_get_SSL_CTX apparently can (and in this case does) return a NULL ptr that is not checked for. now why that happens and whether it's not supposed to happen here (since there's no check for it) i can't tell, but in any case i don't immediately see how the kernel would be involved here directly. you could try a vanilla 3.14.latest kernel just to see if grsec's involved somehow and also ask the nginx guys if they have an idea how to debug it to the root cause (which can lead back to the kernel in which case we'll take over).

by **mutemule** » Wed Oct 07, 2015 3:42 pm

Looks like the bit of code is https://github.com/nginx/nginx/blob/release-1.8.0/src/event/ngx_event_openssl.c#L2307-L2309. Which then goes off into OpenSSL. Yay.

I wasn't sure what grsecurity might have to do with this. I'll build a vanilla kernel and see what I can find. Thanks!

by **kamil** » Thu Oct 08, 2015 8:53 am

Hi.

I think there's something wrong with latest (201510072226) 3.14.54 patch, I'm seeing segfaults all over the place on Debian Wheezy amd64 right after bootup:

Code: Select all: Oct 8 14:27:45 host kernel: find[2182]: segfault at 18 ip 00000363bdbca41a sp 000003b7c9b6f8e0 error 6 in libc-2.13.so[363bdb52000+181000] Oct 8 14:27:45 host kernel: grsec: Segmentation fault occurred at 0000000000000018 in /usr/bin/find[find:2182] uid/euid:0/0 gid/egid:0/0, parent /etc/init.d/saslauthd[saslauthd:2179] uid/euid:0/0 gid/egid:0/0 Oct 8 14:27:45 host kernel: grsec: denied resource overstep by requesting 4096 for RLIMIT_CORE against limit 0 for /usr/bin/find[find:2182] uid/euid:0/0 gid/egid:0/0, parent /etc/init.d/saslauthd[saslauthd:2179] uid/euid:0/0 gid/egid:0/0 Oct 8 14:27:49 host kernel: /usr/sbin/amavi[2753]: segfault at 48 ip 000003533242eb7e sp 000003a606651e10 error 6 in libperl.so.5.14.2[35332349000+177000] Oct 8 14:27:49 host kernel: grsec: Segmentation fault occurred at 0000000000000048 in /usr/sbin/amavisd-new[/usr/sbin/amavi:2753] uid/euid:108/108 gid/egid:8/8, parent /sbin/init[init:1] uid/euid:0/0 gid/egid:0/0 Oct 8 14:27:49 host kernel: grsec: bruteforce prevention initiated due to crash of /usr/sbin/amavisd-new against uid 108, banning suid/sgid execs for 15 minutes. Please investigate the crash report for /usr/sbin/amavisd-new[/usr/sbin/amavi:2753] uid/euid:108/108 gid/egid:8/8, parent /sbin/init[init:1] uid/euid:0/0 gid/egid:0/0 Oct 8 14:30:48 host kernel: head[3311]: segfault at 1c8 ip 000002dffb4353b0 sp 000003dcd756f5b0 error 4 in ld-2.13.so[2dffb42a000+20000] Oct 8 14:30:48 host kernel: grsec: Segmentation fault occurred at 00000000000001c8 in /usr/bin/head[head:3311] uid/euid:0/0 gid/egid:0/0, parent /etc/init.d/clamav-freshclam[clamav-freshcla:3304] uid/euid:0/0 gid/egid:0/0 Oct 8 14:30:48 host kernel: grsec: denied resource overstep by requesting 4096 for RLIMIT_CORE against limit 0 for /usr/bin/head[head:3311] uid/euid:0/0 gid/egid:0/0, parent /etc/init.d/clamav-freshclam[clamav-freshcla:3304] uid/euid:0/0 gid/egid:0/0

Kernel config:
https://gist.github.com/anonymous/048228b776aa18172dcd

Let me know if you need more info.

Best regards.

by **mutemule** » Thu Oct 08, 2015 10:48 am

Aha! It took a while, but I successfully reproduced it on a vanilla kernel:

Code: Select all: Oct 8 14:13:57 webtest kernel: [ 5983.515252] show_signal_msg: 9 callbacks suppressed Oct 8 14:13:57 webtest kernel: [ 5983.515262] nginx[13605]: segfault at 0 ip 000000000042696d sp 00007fff475fe300 error 4 in nginx[400000+9f000]

This is a vanilla 3.14.54 using the exact same configuration as the grsecurity-patched kernel.

Backtrace:

Code: Select all: Reading symbols from /usr/sbin/nginx...done. [New LWP 13605] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `nginx: worker process '. Program terminated with signal SIGSEGV, Segmentation fault. #0 ngx_ssl_new_session (ssl_conn=0x19cb8b0, sess=0x1545830) at src/event/ngx_event_openssl.c:2309 2309 cache = shm_zone->data; (gdb) bt #0 ngx_ssl_new_session (ssl_conn=0x19cb8b0, sess=0x1545830) at src/event/ngx_event_openssl.c:2309 #1 0x00007f49cbdc8790 in ssl_update_cache (s=s@entry=0x19cb8b0, mode=mode@entry=2) at ssl_lib.c:2451 #2 0x00007f49cbda7908 in ssl3_accept (s=0x19cb8b0) at s3_srvr.c:796 #3 0x00000000004284f1 in ngx_ssl_handshake (c=c@entry=0x1c78f80) at src/event/ngx_event_openssl.c:1093 #4 0x0000000000428738 in ngx_ssl_handshake_handler (ev=<optimized out>) at src/event/ngx_event_openssl.c:1245 #5 0x000000000041ea58 in ngx_event_process_posted (cycle=cycle@entry=0x1236330, posted=0x6b54f0 <ngx_posted_events>) at src/event/ngx_event_posted.c:33 #6 0x000000000041e645 in ngx_process_events_and_timers (cycle=cycle@entry=0x1236330) at src/event/ngx_event.c:265 #7 0x00000000004240cc in ngx_worker_process_cycle (cycle=0x1236330, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:767 #8 0x0000000000422b46 in ngx_spawn_process (cycle=cycle@entry=0x1236330, proc=proc@entry=0x423ffb <ngx_worker_process_cycle>, data=data@entry=0x1, name=name@entry=0x481386 "worker process", respawn=respawn@entry=-4) at src/os/unix/ngx_process.c:198 #9 0x0000000000424225 in ngx_start_worker_processes (cycle=cycle@entry=0x1236330, n=2, type=type@entry=-4) at src/os/unix/ngx_process_cycle.c:357 #10 0x0000000000425098 in ngx_master_process_cycle (cycle=0x1236330, cycle@entry=0x11c1cc0) at src/os/unix/ngx_process_cycle.c:242 #11 0x00000000004081c6 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:419 (gdb)

I'm already reaching out to the nginx guys, but this is definitely *not* a problem with grsecurity.

Thanks!

EDIT: For the archives: nginx has this filed as a known issue: https://trac.nginx.org/nginx/ticket/235. They're claiming that SSL_get_SSL_CTX() should never return NULL, so the problem is in OpenSSL (which is correct), so they don't want to fix it in nginx. I've asked they fix it regardless, as one dropped/reset session is better than one dead worker.

by **kamil** » Thu Oct 08, 2015 4:05 pm

Ok, so it seems that my issue is not related to nginx crash.

I guess I'll create another thread for better visibility, unless someone is already looking into it?
I suppose it only happens in some specific kernel configuration or someone else would also notice it. I'll try to find the feature related to those crashes.

Best regards.

by **PaX Team** » Thu Oct 08, 2015 5:13 pm

kamil wrote:I guess I'll create another thread for better visibility, unless someone is already looking into it?
I suppose it only happens in some specific kernel configuration or someone else would also notice it. I'll try to find the feature related to those crashes.

i already tested your config briefly but nothing bad happened, so i can only suggest that you try a vanilla kernel too and/or figure out if there's a particular grsec option that causes these segfaults.

by **kamil** » Thu Oct 08, 2015 8:06 pm

PaX Team wrote:i already tested your config briefly but nothing bad happened, so i can only suggest that you try a vanilla kernel too and/or figure out if there's a particular grsec option that causes these segfaults.

Thanks for your reply.
I've tracked it down to UDEREF.

UDEREF off - no crashes
UDEREF on (even if it's the only grsec/pax option enabled at compile time) - random crashes in random binaries. I often see crashes during init sequence. If not, kernel compilation is the best test, something always segfaults in less than a minute.

More crash logs:

Code: Select all: Oct 8 21:07:30 test kernel: ld[7392]: segfault at 18 ip 0000024bc0b75fcb sp 000003be323068b0 error 4 in libbfd-2.22-system.so[24bc0b01000+ec000] Oct 8 21:36:10 test kernel: PAX: From 89.73.159.133: execution attempt in: (null), 00000000-00000000 00000000 Oct 8 21:36:10 test kernel: PAX: terminating task: /usr/lib/gcc/x86_64-linux-gnu/4.7/cc1(cc1):4664, uid/euid: 0/0, PC: (nil), SP: 000003e4af0fb368 Oct 8 21:36:10 test kernel: PAX: bytes at PC: ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? Oct 8 21:36:10 test kernel: PAX: bytes at SP-8: 0000000000000000 00000000007623fe 000003e4af0fb1f0 000001ad604b9b88 0000000000c71b15 ff38e4eb5fcc5a70 0000000000000001 000001ad604c8548 0000000000000003 0000000000000000 0000000002aa65e0 Oct 8 22:17:38 test kernel: find[4454]: segfault at 0 ip 000000000040aabd sp 000003fd7448f4d0 error 4 in find[400000+37000] Oct 8 23:45:25 test kernel: loadkeys[2065]: segfault at 0 ip 000003594f781146 sp 000003fe3643ae98 error 4 in libc-2.13.so[3594f66e000+181000]

Kernel is compiled with gcc 4.7.2 from Wheezy.
I'm testing it on a kvm guest right now (linode vm), but I also saw this problem on real hardware.

Let me know how can I debug this further. I can also give you root access to the vm I'm testing it on.

Best regards.

by **PaX Team** » Thu Oct 08, 2015 8:52 pm

can you tell me which UDEREF gets activated on boot (it's in dmesg)? also /proc/cpuinfo may be useful (to know which of pcid/invpcid is available at all).

by **kamil** » Thu Oct 08, 2015 9:11 pm

PaX Team wrote:can you tell me which UDEREF gets activated on boot (it's in dmesg)? also /proc/cpuinfo may be useful (to know which of pcid/invpcid is available at all).

Sure:
PAX: PCID detected
PAX: strong UDEREF enabled
PAX: INVPCID detected

Whole dmesg:
https://gist.github.com/anonymous/6742e2bbf41ff80b7203

cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
stepping : 2
microcode : 0x1
cpu MHz : 2499.986
cache size : 30720 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm arat xsaveopt stronguderef fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
bogomips : 4999.97
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 1
(same)

by **PaX Team** » Thu Oct 08, 2015 9:33 pm

what happens if you boot with nopcid on the kernel command line?

by **PaX Team** » Thu Oct 08, 2015 9:37 pm

also can you try a test kernel (4.1 or 4.2) to see if they behave better? (i recently rewrote the pcid handling mechanism that isn't backported to 3.14)

by **kamil** » Thu Oct 08, 2015 9:42 pm

PaX Team wrote:what happens if you boot with nopcid on the kernel command line?

No change, same segfaults with "slow and weak UDEREF".

by **kamil** » Thu Oct 08, 2015 9:50 pm

PaX Team wrote:also can you try a test kernel (4.1 or 4.2) to see if they behave better? (i recently rewrote the pcid handling mechanism that isn't backported to 3.14)

Yeah, I can try, but please note that this config worked just fine on 3.14.53, so it must be some change in the latest upstream version or grsec patch that's responsible for these problems.

Anyway, I'm off to bed now, I'll continue working on this issue tomorrow, let me know if there's something else that you want me to try.

by **quasar366** » Fri Oct 09, 2015 2:41 am

I can save some time for testing.
I've tested the grsecurity-3.1-4.2.3-201510072230.patch on 3 Servers and one desktop system (all ubuntu 14.04 64bit)

The segementation faults occurs on very early boot process, before the root file system is mounted.
On one of my machines, the segmentation fault is happening on udev, where the boot process stops after this error, because of not finding any file system.

On my kvm master machine I have nf_conntrack table full dropping packet messages 2 seconds, after the iptable modules were loaded, but no segfaults.

The strange is, that on my desktop system, all is running fine, without any segfaults.

Please let me know, if you need some further informations

grsecurity forums

Segmentation faults in nginx with 3.1-201510012203

Segmentation faults in nginx with 3.1-201510012203

Re: Segmentation faults in nginx with 3.1-201510012203

Re: Segmentation faults in nginx with 3.1-201510012203

Re: Segmentation faults in nginx with 3.1-201510012203

Re: Segmentation faults in nginx with 3.1-201510012203

Re: Segmentation faults in nginx with 3.1-201510012203

Re: Segmentation faults in nginx with 3.1-201510012203

Re: Segmentation faults in nginx with 3.1-201510012203

Re: Segmentation faults in nginx with 3.1-201510012203

Re: Segmentation faults in nginx with 3.1-201510012203

Re: Segmentation faults in nginx with 3.1-201510012203

Re: Segmentation faults in nginx with 3.1-201510012203

Re: Segmentation faults in nginx with 3.1-201510012203

Re: Segmentation faults in nginx with 3.1-201510012203

Re: Segmentation faults in nginx with 3.1-201510012203