Page 1 of 1

Oops with "PAX: suspicious general protection fault"

PostPosted: Wed May 18, 2011 1:26 pm
by moseleymark
Dunno if this is related to PAX or just PAX reporting an exception coming up from the kernel. It's only 2.6.38.4 (but recent enough I figured I'd check), but upgrading to 2.6.38.6 is not a problem. This is on 32-bit Debian Squeeze. The box will run anywhere from 12-24 hours or so before dying with this oops:

[11265.754756] PAX: suspicious general protection fault: 0000 [#1] SMP
[11265.763008] last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
[11265.763008] Modules linked in: ip_queue evdev i5k_amb hwmon dcdbas i5000_edac button dm_mod ide_cd_mod cdrom [last unloaded: scsi_wait_scan]
[11265.763008]
[11265.763008] Pid: 20561, comm: httpd Tainted: G W 2.6.38.4 #1 Dell Inc. PowerEdge 1950/0DT097
[11265.763008] EIP: 0060:[<000ecbbe>] EFLAGS: 00210246 CPU: 3
[11265.763008] EIP is at vma_prio_tree_add+0x8e/0xb0
[11265.763008] EAX: 1aea1a99 EBX: ece1fb40 ECX: ffffffff EDX: ece1fb6c
[11265.763008] ESI: e9ba2de0 EDI: f0aab200 EBP: e834de7c ESP: e834de74
[11265.763008] DS: 0068 ES: 0068 FS: 00d8 GS: 007b SS: 0068
[11265.763008] Process httpd (pid: 20561, ti=ec8f43d8 task=ec8f4080 task.ti=ec8f43d8)
[11265.763008] Stack:
[11265.763008] f0e03c80 ece1fb40 e834ded0 00046864 ece1fbc0 f0e61b80 e9ba2de0 ec8f4080
[11265.763008] ec8f4080 ece1f980 ece1f984 ece1f96c ece1fb40 f0e03cb4 f0e61bb4 f0aab200
[11265.763008] f0aab1b4 00000000 00000000 ece1f960 fffffff4 00000000 f0d79580 e834df20
[11265.763008] Call Trace:
[11265.763008] [<00046864>] dup_mm+0x444/0x560
[11265.763008] [<00047704>] copy_process+0xd14/0x10c0
[11265.763008] [<01200011>] ? 0x1200011
[11265.763008] [<00200246>] ? zisofs_uncompress_block+0x206/0x420
[11265.763008] [<01200011>] ? 0x1200011
[11265.763008] [<00047b22>] do_fork+0x72/0x2c0
[11265.763008] [<01200011>] ? 0x1200011
[11265.763008] [<000a07aa>] ? audit_syscall_entry+0x1aa/0x1d0
[11265.763008] [<01200011>] ? 0x1200011
[11265.763008] [<01200011>] ? 0x1200011
[11265.763008] [<0000bf98>] sys_clone+0x38/0x50
[11265.763008] [<01200011>] ? 0x1200011
[11265.763008] [<00004585>] ptregs_clone+0x15/0x20
[11265.763008] [<005f2e41>] ? syscall_call+0x7/0xb
[11265.763008] [<01200011>] ? 0x1200011
[11265.763008] [<00200282>] ? zisofs_uncompress_block+0x242/0x420
[11265.763008] Code: 53 30 89 0a 5b 5e 5d c3 90 8d 74 26 00 8d 43 2c 89 73 38 89 43 2c 89 43 30 89 5e 38 5b 5e 5d c3 90 8d 74 26 00 8b 46 2c 8d 53 2c <89> 50 04 89 43 2c 8d 46 2c 89 43 30 89 56 2c 5b 5e 5d c3 0f 0b
[11265.763008] EIP: [<000ecbbe>] vma_prio_tree_add+0x8e/0xb0 SS:ESP 0068:e834de74
[11266.155419] ---[ end trace 847e4d83bdf41d99 ]---
[11266.165246] Kernel panic - not syncing: Fatal exception
[11266.176217] Pid: 20561, comm: httpd Tainted: G D W 2.6.38.4 #1
[11266.190301] Call Trace:
[11266.195841] [<005eec32>] ? panic+0x5c/0x179
[11266.205133] [<00200246>] ? zisofs_uncompress_block+0x206/0x420
[11266.217535] [<00200246>] ? zisofs_uncompress_block+0x206/0x420
[11266.229941] [<005f429e>] ? oops_end+0x8e/0xd0
[11266.239424] [<00007775>] ? die+0x55/0x80
[11266.247908] [<00200246>] ? zisofs_uncompress_block+0x206/0x420
[11266.260442] [<005f40ef>] ? do_general_protection+0x21f/0x230
[11266.272397] [<005f3ed0>] ? do_general_protection+0x0/0x230
[11266.284077] [<005f3650>] ? error_code+0x80/0x90
[11266.293868] [<000ecbbe>] ? vma_prio_tree_add+0x8e/0xb0
[11266.304857] [<00210246>] ? nfs_direct_commit_release+0x36/0xf0
[11266.317253] [<00046864>] ? dup_mm+0x444/0x560
[11266.326668] [<00047704>] ? copy_process+0xd14/0x10c0
[11266.337336] [<01200011>] ? 0x1200011
[11266.345305] [<00200246>] ? zisofs_uncompress_block+0x206/0x420
[11266.357666] [<01200011>] ? 0x1200011
[11266.365537] [<00047b22>] ? do_fork+0x72/0x2c0
[11266.375001] [<01200011>] ? 0x1200011
[11266.382897] [<000a07aa>] ? audit_syscall_entry+0x1aa/0x1d0
[11266.394717] [<01200011>] ? 0x1200011
[11266.402599] [<01200011>] ? 0x1200011
[11266.410545] [<0000bf98>] ? sys_clone+0x38/0x50
[11266.420302] [<01200011>] ? 0x1200011
[11266.428231] [<00004585>] ? ptregs_clone+0x15/0x20
[11266.438403] [<005f2e41>] ? syscall_call+0x7/0xb
[11266.448234] [<01200011>] ? 0x1200011
[11266.456124] [<00200282>] ? zisofs_uncompress_block+0x242/0x420

I have FRAME_POINTERS on and KALLSYMS on (and CONFIG_GRKERNSEC_HIDESYM off), so I'm not sure where that one address (0x1200011) is coming from or why it's not decoded. If you guys think it's possibly a PAX issue, let me know what to send along (vmlinux, System.map, etc). Thanks!

Re: Oops with "PAX: suspicious general protection fault"

PostPosted: Wed May 18, 2011 2:24 pm
by cormander
This is just PAX reporting an abnormal GPF from the kernel. You can see the change made here:

Code: Select all
@@ -305,6 +327,13 @@ gp_in_kernel:
        if (notify_die(DIE_GPF, "general protection fault", regs,
                                error_code, 13, SIGSEGV) == NOTIFY_STOP)
                return;
+
+#if defined(CONFIG_X86_32) && defined(CONFIG_PAX_KERNEXEC)
+       if ((regs->cs & 0xFFFF) == __KERNEL_CS || (regs->cs & 0xFFFF) == __KERNEXEC_KERNEL_CS)
+               die("PAX: suspicious general protection fault", regs, error_code);
+       else
+#endif
+
        die("general protection fault", regs, error_code);
 }


From the stack trace it looks like apache is choking on a read from zisofs decompression. Are you mounting a iso file or have a CD/DVD in your drive?

This is most likely a bug in the upstream kernel.

Re: Oops with "PAX: suspicious general protection fault"

PostPosted: Wed May 18, 2011 3:12 pm
by moseleymark
The zisofs_uncompress_block bit is especially puzzling to me. We don't have any CDs or any ISO images mounted on these boxes. The only filesystem types that we have mounted are these:

# cat /proc/mounts | awk '{ print $3 }' | sort -u
cgroup
devpts
devtmpfs
ext3
ext4
nfs
proc
rootfs
sysfs
tmpfs

Nothing ISO-ish there. Looking at objdump of vmlinux, it agrees that zisofs_uncompress_block is at that address, 00200040 (as does System.map).

Re: Oops with "PAX: suspicious general protection fault"

PostPosted: Fri May 20, 2011 4:03 am
by PaX Team
moseleymark wrote:Dunno if this is related to PAX or just PAX reporting an exception coming up from the kernel.
that's a good question, i can't tell it from this log. what happened was that during fork and copying some address space management related data the kernel dereferenced an invalid pointer (as in, it didn't point to kernel space, i guess you had UDEREF on). now where that pointer came from is something i don't know. do you have indentical/similar machines running the same kernel/workload? do you see the problem manifest there too (just trying to find out if it could be a hw related problem)? what is your kernel config (at least the PaX bits)? can you tell when this problem began to show up (provided you ran this workload with earlier kernels)? in any case, updating to a newer kernel won't hurt ;).
so I'm not sure where that one address (0x1200011) is coming from or why it's not decoded.
the backtrace code will decode anything that looks like a code address, regardless of the frame pointer (the '?' indicates these unreliable 'addresses'). since you have KERNEXEC enabled, low values all look like potential code addresses.