Page 1 of 2

Semi-random kernel panics on 3.7.1+201212271953

PostPosted: Sun Dec 30, 2012 9:25 am
by Neokernsec
I've been trying to track down the causes of semi-random kernel panics on Vanilla 3.7.1 patched with grsec-201212271953.

I say semi-random, because when I issue a remote shutdown of the server via ssh, it panics fairly regularly shortly after the shutdown notice is broadcast, but BEFORE the filesystems are unmounted. The TCP links aren't cleanly killed as in a normally completed shutdown because my ssh client takes a while to figure out they're dead. The filesystems are left dirty, and need journal fixups when rebooted.

At least one panic occurred during a kernel rebuild initiated over ssh. Symbols from within the Intel e1000 driver could be spied in that stack dump.

Distro: Slackware 14 x86_64, both "pristine", and fully patched (updated to gcc-4.7.2, binutils, and a formal adoption of 3.7.1).

Server: Dual socket Xeon 2.8GHz with 16GB of ECC DDR2-3200 RAM. (SuperMicro X6DHR-8G2). The system's been burned in for 72 hours following some routine maintenance and RAM upgrades. Formerly, it'd run for 4 years before finally being shutdown for a storage and RAM upgrade.

Disks: AIC-7902B with 4 x 147GB Hitachi 15K RPM U320 drives. (Brand new, and fully scanned for bad blocks.)

Kernels 2.6.x and 3.2.x run on this same hardware do not exhibit this problem, the former running with older grsec patches.

Kernel panic screen: Image

Kernel configuration: http://pastebin.com/XF9REDTu

Crashing Kernel: http://www.mediafire.com/?qtbrunyb797xair
System.map file for above kernel: http://www.mediafire.com/?7s5t452g44blpjc

Server lpci output:
00:00.0 Host bridge: Intel Corporation E7520 Memory Controller Hub (rev 0c)
00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 0c)
00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 0c)
00:06.0 PCI bridge: Intel Corporation E7520 PCI Express Port C (rev 0c)
00:1d.0 USB controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02)
00:1d.1 USB controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02)
00:1d.2 USB controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02)
00:1d.3 USB controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02)
00:1d.7 USB controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
01:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09)
01:00.1 PIC: Intel Corporation 6700/6702PXH I/OxAPIC Interrupt Controller A (rev 09)
01:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09)
01:00.3 PIC: Intel Corporation 6700PXH I/OxAPIC Interrupt Controller B (rev 09)
02:03.0 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03)
02:03.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03)
03:04.0 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
03:04.1 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
06:01.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Rage XL (rev 27)

Re: Semi-random kernel panics on 3.7.1+201212271953

PostPosted: Sun Dec 30, 2012 7:19 pm
by PaX Team
would it be possible to capture the full kernel logs (over netconsole or similar) perhaps? also try to disable the size overflow plugin and see if you can still reproduce the crashes.

Re: Semi-random kernel panics on 3.7.1+201212271953

PostPosted: Sun Dec 30, 2012 8:13 pm
by Neokernsec
I'll try. I suspect I'll have to rig a serial port cable because the panics seem to happen within the bowels of the networking stack. I've been trying to reproduce the panics with the same kernel by avoiding the use of ssh, and after six tries, have FAILED to reproduce the panic. Every panic's stack backtrace shows functions detailing tcp_ or network driver codepaths. So that's another data point. I'll do a kernel rebuild on the local console and stress the machine out and see if I can induce a panic while being "net-less".

I doubt netconsole would give me anything, but I'll try that first since I'd have to dig through "deep storage" for one of my null modem cables. Thankfully, the PC platform has kept putting serial ports on motherboards up until VERY recently. :)

I'll also do as you suggest and disable the size overflow plugin and retry the "ssh" and "ssh-less" trial runs.

I'm also using a much larger VESA framebuffer mode, in case I'm forced to rely on caveman "eyeball" stacktrace logging technology.

Thanks for your speedy reply.

Re: Semi-random kernel panics on 3.7.1+201212271953

PostPosted: Sun Dec 30, 2012 8:38 pm
by spender
Have you tried any previous 3.7 patch?

-Brad

Re: Semi-random kernel panics on 3.7.1+201212271953

PostPosted: Sun Dec 30, 2012 8:45 pm
by Neokernsec
Actually, no. I came to grsec looking for the current 3.6 patch, but noticed that you'd already moved on to 3.7. I can't use 3.2 because of some strange lockups when marshalling multiple TBs of data from/to software mdadm RAID volumes. (No spare cycles to try to diagnose them, they just happened. Upgrading to 3.6 fixed them, as did downgrading to 2.6.)

I can try the previous 3.7.1 patch if you think it'd make any difference.

First round of tests:

Whilst still running SIZE_OVERFLOW_PLUGIN=y, but with netconsole + the bigger framebuffer, the panic occurred on shutdown when ssh connections were active.

Alas, there's no stacktrace on either the netconsole (unsurprisingly), or the framebuffer console. (Perhaps the more complex vesafb console code dies earlier?)

All I got was:

INIT: Switching to runlevel: 6
INIT: Sending processes the TERM signal
Running shutdown script /etc/rc.d/rc.6:
Saving system time to the hardware clock (UTC)
[ 5366.149884]: grsec: time set by /sbin/hwclock[hwclock:6612] blahblah...
[ 5366.178488]: Kernel panic - not syncing: Aieee, killing interrupt handler!
[ 5366.184983]: Pid: 1144, comm: sshd Not tainted 3.7.2-grsec #5
Unmounting remote filesystems. [5366.191002] Call trace:

and that's it.

Next iteration, I'll try the same exact session/workload, with serial console enabled, *AND* SIZE_OVERFLOW DISABLED. If that doesn't panic, I'll try the kernel with the SIZE_OVERFLOW plugin enabled, but with serial console.

Time to find that rusty null modem cable... :D

Wow! Not so rusty after all! I guess that time when I was doing Atmel AVR + Mitsubishi M16C development involved some hairy serial port bringup work. The cable's a rather deluxe 9/25 -> 9/25 hydra deal meant to handle any/all kinds of gnarly systems & breadboards. :evil:

OK, here's the full dump via serial port console:
Code: Select all
[ 1325.012637] grsec: time set by /usr/sbin/ntpd[ntpd:1099] uid/euid:0/0 gid/egid:0/0, parent /sbin/init[init:1] uid/eui
d:0/0 gid/egid:0/0
INIRunning shutdown script /etc/rc.d/rc.6:
Saving system time to the hardware clock (UTC).
[ 6661.506785] grsec: time set by /sbin/hwclock[hwclock:32584] uid/euid:0/0 gid/egid:0/0, parent /etc/rc.d/rc.6[rc.6:325
79] uid/euid:0/0 gid/egid:0/0
[ 6661.545775] Kernel panic - not syncing: Aiee, killing interrupt handler!
Unmounting remot[ 6661.552032] Pid: 1168, comm: sshd Not tainted 3.7.1-grsec #5
e filesystems.
[ 6661.571438] Call Trace:
[ 6661.573763]  [<ffffffff81727e66>] ? panic+0xc4/0x1cb
[ 6661.573763]  [<ffffffff8108df46>] ? do_exit+0x896/0x8c0
[ 6661.573763]  [<ffffffff8108e40d>] ? do_group_exit+0x3d/0xa0
[ 6661.573763]  [<ffffffff8118bb33>] ? report_size_overflow+0x33/0x40
[ 6661.573763]  [<ffffffff81644ac9>] ? skb_pad+0x189/0x1b0
[ 6661.573763]  [<ffffffff811471b8>] ? get_page_from_freelist+0x1b8/0x430
[ 6661.573763]  [<ffffffff8153c211>] ? e1000_xmit_frame+0x561/0x10c0
[ 6661.573763]  [<ffffffff8121450c>] ? ext3_dirty_inode+0x5c/0xb0
[ 6661.573763]  [<ffffffff8121113e>] ? ext3_mark_iloc_dirty+0x33e/0x400
[ 6661.573763]  [<ffffffff81654598>] ? dev_hard_start_xmit+0x268/0x4f0
[ 6661.573763]  [<ffffffff8167753d>] ? sch_direct_xmit+0x10d/0x1e0
[ 6661.573763]  [<ffffffff81654bf0>] ? dev_queue_xmit+0x170/0x440
[ 6661.573763]  [<ffffffff81691cf5>] ? ip_finish_output+0x2e5/0x3d0
[ 6661.573763]  [<ffffffff81691f2a>] ? ip_queue_xmit+0x6a/0x3f0
[ 6661.573763]  [<ffffffff816abe99>] ? tcp_transmit_skb+0x3d9/0x900
[ 6661.573763]  [<ffffffff816ac5a8>] ? tcp_write_xmit+0x118/0xa60
[ 6661.573763]  [<ffffffff8117bd88>] ? ksize+0x18/0xc0
[ 6661.573763]  [<ffffffff81644b90>] ? __alloc_skb+0xa0/0x2b0
[ 6661.573763]  [<ffffffff816acf5a>] ? __tcp_push_pending_frames+0x2a/0x90
[ 6661.573763]  [<ffffffff816a0a88>] ? tcp_close+0x3a8/0x440
[ 6661.573763]  [<ffffffff816c4c98>] ? inet_release+0x78/0x90
[ 6661.573763]  [<ffffffff81635efc>] ? sock_release+0x2c/0xb0
[ 6661.573763]  [<ffffffff816362f7>] ? sock_close+0x17/0x40
[ 6661.573763]  [<ffffffff811878e9>] ? __fput+0xe9/0x250
[ 6661.573763]  [<ffffffff810ace70>] ? task_work_run+0xb0/0xd0
[ 6661.573763]  [<ffffffff8108d84f>] ? do_exit+0x19f/0x8c0
[ 6661.573763]  [<ffffffff8108e40d>] ? do_group_exit+0x3d/0xa0
[ 6661.573763]  [<ffffffff8108e487>] ? sys_exit_group+0x17/0x20
[ 6661.573763]  [<ffffffff81735128>] ? system_call_fastpath+0x18/0x1d


Crashing kernel: http://www.mediafire.com/?qtbrunyb797xair
System.map file for above kernel: http://www.mediafire.com/?7s5t452g44blpjc

UPDATE:OK, first reboot with a make clean && make && make modules_install 3.7.1-grsec with SIZE_OVERFLOW DISABLED looks good. I rebuilt the kernel, had two ssh sessions open, one running TOP, and did an ssh-initiated shutdown -r now. All seemed to go well. I'll run more systematic tests shortly.

UPDATE 2:I can't reproduce the panic with SIZE_OVERFLOW DISABLED, so it's looking good so far. I had several ssh sessions going, some doing an evil cat */*/*/*, others doing a compute-intensive rebuild, and issued a shutdown -h now, with no problems.

I suspect the grsec devs know the weak points, since they zeroed in on the overflow plugin straight away.

I wonder if it's possible to issue a "scary score" of the various hardening options that express the developers' understanding of the code's fragility with respect to whatever security enhancement offsets it may provide, assuming it works of course.

While I have this box and the neolithic cables all setup here, I'm happy to perform additional tests + regressions over the next 16 hours, if the devs would like.

Re: Semi-random kernel panics on 3.7.1+201212271953

PostPosted: Mon Dec 31, 2012 12:08 pm
by ephox
Hi,

Thanks for the report.
Could you do the following please:
* send me vmlinux
* add "loglevel=8" to the kernel command line at boot
* enable CONFIG_FRAME_POINTER in .config.
* and send me the backtrace again (it will hopefully contain this message as well: "PAX: size overflow detected in function...")

Re: Semi-random kernel panics on 3.7.1+201212271953

PostPosted: Mon Dec 31, 2012 9:11 pm
by Neokernsec
As you wish!
loglevel=8, CONFIG_FRAME_POINTER=y enabled

I'm not seeing the message you've mentioned, though.
Are there any other FRAME POINTER .config options I might have missed?

Here's a combed version of my config file. (I do a make clean before rebuilding the kernels out of paranoia, by the way.)

Code: Select all
$ grep -i frame /boot/config-3.7.1grsecdebug
CONFIG_SCHED_OMIT_FRAME_POINTER=y
# Frame buffer hardware drivers
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
# CONFIG_FRAMEBUFFER_CONSOLE_ROTATION is not set
CONFIG_FRAME_WARN=0
CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_FRAME_POINTER=y

Config file: http://www.mediafire.com/?885eq3k2c5b81pm
System.map: http://www.mediafire.com/?g854emxf85now6f
vmlinuz: http://www.mediafire.com/?ehzkp25yybtvkc0

You don't need to do anything special to trigger the panic. Just issuing a shutdown -[rh] now from ssh will do it.

Current stacktrace:
Code: Select all
Welcome to Linux 3.7.1-grsecdebug (ttyS0)

isengard login: INIRunning shutdown script /etc/rc.d/rc.6:
Saving system time to the hardware clock (UTC).
[   95.356169] grsec: time set by /sbin/hwclock[hwclock:1191] uid/euid:0/0 gid/egid:0/0, parent /etc/rc.d/rc.6[rc.6:1186
] uid/euid:0/0 gid/egid:0/0
[   95.400241] Kernel panic - not syncing: Aiee, killing interrupt handler!
[   95.409670] Pid: 1147, comm: sshd Not tainted 3.7.1-grsecdebug #5
[   95.409670] Call Trace:
[   95.409670]  [<ffffffff816e5fbd>] panic+0xbb/0x1c5
[   95.409670]  [<ffffffff8108f180>] do_exit+0x8b0/0x8e0
[   95.409670]  [<ffffffff810065cc>] ? show_trace_log_lvl+0x5c/0x80
[   95.409670]  [<ffffffff8108f65f>] do_group_exit+0x3f/0xa0
[   95.409670]  [<ffffffff8118dc53>] report_size_overflow+0x33/0x40
[   95.409670]  [<ffffffff81606c17>] skb_pad+0x197/0x1c0
[   95.409670]  [<ffffffff8128575e>] ? do_get_write_access+0x3be/0x4b0
[   95.409670]  [<ffffffffa00a05d8>] e1000_xmit_frame+0x528/0x1050 [e1000]
[   95.409670]  [<ffffffff811bf2d0>] ? __find_get_block_slow+0xd0/0x190
[   95.409670]  [<ffffffff811c158d>] ? __getblk+0x2d/0x2e0
[   95.409670]  [<ffffffff810b2417>] ? bit_waitqueue+0x17/0xb0
[   95.409670]  [<ffffffff81616767>] dev_hard_start_xmit+0x247/0x520
[   95.409670]  [<ffffffffa00b1520>] ? e1000_check_options+0x77d/0x6745 [e1000]
[   95.409670]  [<ffffffff816379e0>] sch_direct_xmit+0x100/0x1e0
[   95.409670]  [<ffffffff81616e09>] dev_queue_xmit+0x179/0x440
[   95.409670]  [<ffffffff81652547>] ip_finish_output+0x2c7/0x3c0
[   95.409670]  [<ffffffff81652fac>] ip_output+0x5c/0xa0
[   95.409670]  [<ffffffff8165271f>] ip_local_out+0x2f/0x40
[   95.409670]  [<ffffffff81652881>] ip_queue_xmit+0x151/0x3d0
[   95.409670]  [<ffffffff8160550e>] ? __skb_clone+0x2e/0x130
[   95.409670]  [<ffffffff8166cdb3>] tcp_transmit_skb+0x3f3/0x920
[   95.409670]  [<ffffffff8166d4c3>] tcp_write_xmit+0x123/0xa70
[   95.409670]  [<ffffffff8166de82>] __tcp_push_pending_frames+0x32/0xa0
[   95.409670]  [<ffffffff8166eb23>] tcp_send_fin+0x83/0x1d0
[   95.409670]  [<ffffffff81661728>] tcp_close+0x398/0x430
[   95.409670]  [<ffffffff81685cea>] inet_release+0x7a/0x90
[   95.409670]  [<ffffffff815f7d5e>] sock_release+0x2e/0xa0
[   95.409670]  [<ffffffff815f8147>] sock_close+0x17/0x30
[   95.409670]  [<ffffffff811899fc>] __fput+0xec/0x250
[   95.409670]  [<ffffffff81189b6e>] ____fput+0xe/0x20
[   95.409670]  [<ffffffff810add08>] task_work_run+0xb8/0xe0
[   95.409670]  [<ffffffff8108ea74>] do_exit+0x1a4/0x8e0
[   95.409670]  [<ffffffff811aa48f>] ? mnt_drop_write+0x1f/0x30
[   95.409670]  [<ffffffff81185d19>] ? filp_close+0x69/0xa0
[   95.409670]  [<ffffffff8108f65f>] do_group_exit+0x3f/0xa0
[   95.409670]  [<ffffffff8108f6d7>] sys_exit_group+0x17/0x20
[   95.409670]  [<ffffffff816f34a8>] system_call_fastpath+0x18/0x1d
[   95.409670]  [<ffffffff816f34ce>] ? sysret_check+0x1c/0x58
Unmounting remot


UPDATE:OOOOh, I got another panic just doing routine system operations with this bad kernel. I'd done a grep that produced a great deal of output, and it panicked about a second into the dump.

Code: Select all
isengard login: [ 1278.144943] grsec: time set by /usr/sbin/ntpd[ntpd:1085] uid/euid:0/0 gid/egid:0/0, parent /sbin/ini0
[ 3653.293345] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 3653.294002] Pid: 1128, comm: sshd Not tainted 3.7.1-grsecdebug #5
[ 3653.294002] Call Trace:
[ 3653.294002]  <IRQ>  [<ffffffff816e5fbd>] panic+0xbb/0x1c5
[ 3653.294002]  [<ffffffff8108f180>] do_exit+0x8b0/0x8e0
[ 3653.294002]  [<ffffffff810065cc>] ? show_trace_log_lvl+0x5c/0x80
[ 3653.294002]  [<ffffffff8108f65f>] do_group_exit+0x3f/0xa0
[ 3653.294002]  [<ffffffff8118dc53>] report_size_overflow+0x33/0x40
[ 3653.294002]  [<ffffffff81606c17>] skb_pad+0x197/0x1c0
[ 3653.294002]  [<ffffffff81407f38>] ? swiotlb_dma_mapping_error+0x18/0x30
[ 3653.294002]  [<ffffffffa00d45d8>] e1000_xmit_frame+0x528/0x1050 [e1000]
[ 3653.294002]  [<ffffffff81616767>] dev_hard_start_xmit+0x247/0x520
[ 3653.294002]  [<ffffffffa00e5520>] ? e1000_check_options+0x77d/0x6745 [e1000]
[ 3653.294002]  [<ffffffff816379e0>] sch_direct_xmit+0x100/0x1e0
[ 3653.294002]  [<ffffffff81616e09>] dev_queue_xmit+0x179/0x440
[ 3653.294002]  [<ffffffff81652547>] ip_finish_output+0x2c7/0x3c0
[ 3653.294002]  [<ffffffff81652fac>] ip_output+0x5c/0xa0
[ 3653.294002]  [<ffffffff8165271f>] ip_local_out+0x2f/0x40
[ 3653.294002]  [<ffffffff81652881>] ip_queue_xmit+0x151/0x3d0
[ 3653.294002]  [<ffffffff8160550e>] ? __skb_clone+0x2e/0x130
[ 3653.294002]  [<ffffffff8166cdb3>] tcp_transmit_skb+0x3f3/0x920
[ 3653.294002]  [<ffffffff8166d4c3>] tcp_write_xmit+0x123/0xa70
[ 3653.294002]  [<ffffffff8166de82>] __tcp_push_pending_frames+0x32/0xa0
[ 3653.294002]  [<ffffffff816697ad>] tcp_rcv_established+0x32d/0x9f0
[ 3653.294002]  [<ffffffff81669781>] ? tcp_rcv_established+0x301/0x9f0
[ 3653.294002]  [<ffffffff81672e31>] tcp_v4_do_rcv+0xe1/0x350
[ 3653.294002]  [<ffffffff813735ec>] ? security_sock_rcv_skb+0x1c/0x30
[ 3653.294002]  [<ffffffff8162cb47>] ? sk_filter+0x37/0xe0
[ 3653.294002]  [<ffffffff8167509c>] tcp_v4_rcv+0x67c/0x990
[ 3653.294002]  [<ffffffff816e6871>] ? nohz_balance_exit_idle.part.44+0x16/0x43
[ 3653.294002]  [<ffffffff8164d013>] ip_local_deliver_finish+0xf3/0x270
[ 3653.294002]  [<ffffffff8164d31e>] ip_local_deliver+0x4e/0x90
[ 3653.294002]  [<ffffffff810d99b0>] ? tick_nohz_handler+0x100/0x100
[ 3653.294002]  [<ffffffff8164cc66>] ip_rcv_finish+0x86/0x340
[ 3653.294002]  [<ffffffff8164d593>] ip_rcv+0x233/0x390
[ 3653.294002]  [<ffffffff81614ae4>] __netif_receive_skb+0x224/0x800
[ 3653.294002]  [<ffffffff81615263>] netif_receive_skb+0x23/0x80
[ 3653.294002]  [<ffffffff8161564c>] ? dev_gro_receive+0x18c/0x270
[ 3653.294002]  [<ffffffff816153b8>] napi_skb_finish+0x58/0x80
[ 3653.294002]  [<ffffffff81615a65>] napi_gro_receive+0xf5/0x140
[ 3653.294002]  [<ffffffffa00d3534>] e1000_receive_skb+0x64/0x80 [e1000]
[ 3653.294002]  [<ffffffffa00d5c5b>] e1000_clean_rx_irq+0x22b/0x4f0 [e1000]
[ 3653.294002]  [<ffffffffa00d6983>] e1000_clean+0x263/0x910 [e1000]
[ 3653.294002]  [<ffffffff816e6871>] ? nohz_balance_exit_idle.part.44+0x16/0x43
[ 3653.294002]  [<ffffffff810cbd16>] ? trigger_load_balance+0x106/0x220
[ 3653.294002]  [<ffffffff810c2777>] ? scheduler_tick+0x107/0x160
[ 3653.294002]  [<ffffffff81615c40>] net_rx_action+0xb0/0x1b0
[ 3653.294002]  [<ffffffff811cd0ae>] ? fsnotify+0x4e/0x340
[ 3653.294002]  [<ffffffff81092363>] __do_softirq+0xb3/0x1e0
[ 3653.294002]  [<ffffffff8102aeb8>] ? ack_apic_level+0x88/0x150
[ 3653.294002]  [<ffffffff816f4a0c>] call_softirq+0x1c/0x30
[ 3653.294002]  [<ffffffff81004985>] do_softirq+0x55/0x90
[ 3653.294002]  [<ffffffff810925fd>] irq_exit+0x9d/0xb0
[ 3653.294002]  [<ffffffff816f5143>] do_IRQ+0x63/0xf0
[ 3653.294002]  [<ffffffff816f2790>] common_interrupt+0x90/0x90
[ 3653.294002]  <EOI>  [<ffffffff811cd0ae>] ? fsnotify+0x4e/0x340
[ 3653.294002]  [<ffffffff81188418>] vfs_write+0x158/0x200
[ 3653.294002]  [<ffffffff811887f2>] sys_write+0x52/0xa0
[ 3653.294002]  [<ffffffff816f34a8>] system_call_fastpath+0x18/0x1d
[ 3653.294002]  [<ffffffff816f34ce>] ? sysret_check+0x1c/0x58


vmlinux: http://www.mediafire.com/?cap1wyckck9or57

Re: Semi-random kernel panics on 3.7.1+201212271953

PostPosted: Wed Jan 02, 2013 9:28 am
by ephox
I think I fixed the bug, could you test it please? Here is the new version:
http://grsecurity.net/~ephox/overflow_p ... 20130102.c

You just follow these steps:
cp size_overflow_plugin-20130102.c linux-3.7.1/tools/gcc/size_overflow_plugin.c
make clean; make

Thanks :)

Re: Semi-random kernel panics on 3.7.1+201212271953

PostPosted: Thu Jan 03, 2013 8:20 pm
by Neokernsec
Will do... I should have proper feedback on this within 10 hours or so. I'm looking forward to seeing it fully resolved! :)

Thank you, PaXies + Grsec team.

Incidentally, I'm not fully au fait with the new linux kernel numbering schemes. Is 3.7 a "stable" branch? I'm also nervous about the rapidity of the minor version changes, i.e., 3.0, 3.2, 3.4, 3.6, etc. I am a guy who doesn't mind spending more time upfront to make sure a kernel is stable and well hardened, so as to reduce the number of patch-and-reboots, where possible. I've had fully hardened grsec-2.6 kernels running on systems for YEARS with no expoited services, etc, etc.

Thanks for your hard work, guys.

Doesn't look like it's fixed. I'll double-double-check the inclusion of the patched plugin file.

I did an extra-nasty shutdown for this:

ssh Session 1:

cat /usr/src/linux/*/*/* &
shutdown -h now

ssh Session 2: top

Code: Select all
INIT: [ 6227.180706] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 6227.181321] Pid: 1103, comm: sshd Not tainted 3.7.1-grsec #5
[ 6227.181321] Call Trace:
[ 6227.181321]  <IRQ>  [<ffffffff817426ad>] panic+0xbb/0x1c5
[ 6227.181321]  [<ffffffff8108f270>] do_exit+0x8b0/0x8e0
[ 6227.181321]  [<ffffffff810065cc>] ? show_trace_log_lvl+0x5c/0x80
[ 6227.181321]  [<ffffffff8108f74f>] do_group_exit+0x3f/0xa0
[ 6227.181321]  [<ffffffff8118dfe3>] report_size_overflow+0x33/0x40
[ 6227.181321]  [<ffffffff8165d2bd>] skb_pad+0x1bd/0x1e0
[ 6227.181321]  [<ffffffff814089f8>] ? swiotlb_dma_mapping_error+0x18/0x30
[ 6227.181321]  [<ffffffff8154ce68>] e1000_xmit_frame+0x528/0x1050
[ 6227.181321]  [<ffffffff8166d017>] dev_hard_start_xmit+0x247/0x520
[ 6227.181321]  [<ffffffff8168ff60>] sch_direct_xmit+0x100/0x1e0
[ 6227.181321]  [<ffffffff8166d6b9>] dev_queue_xmit+0x179/0x440
[ 6227.181321]  [<ffffffff816aab77>] ip_finish_output+0x2c7/0x3c0
[ 6227.181321]  [<ffffffff816ab5dc>] ip_output+0x5c/0xa0
[ 6227.181321]  [<ffffffff816aad4f>] ip_local_out+0x2f/0x40
[ 6227.181321]  [<ffffffff816aaeb1>] ip_queue_xmit+0x151/0x3d0
[ 6227.181321]  [<ffffffff8165bb8e>] ? __skb_clone+0x2e/0x130
[ 6227.181321]  [<ffffffff816c54f3>] tcp_transmit_skb+0x3f3/0x920
[ 6227.181321]  [<ffffffff816c5c03>] tcp_write_xmit+0x123/0xa70
[ 6227.181321]  [<ffffffff816c65c2>] __tcp_push_pending_frames+0x32/0xa0
[ 6227.181321]  [<ffffffff816c1e9d>] tcp_rcv_established+0x32d/0x9f0
[ 6227.181321]  [<ffffffff810c148c>] ? ttwu_do_wakeup+0x2c/0xf0
[ 6227.181321]  [<ffffffff816cb5e1>] tcp_v4_do_rcv+0xe1/0x350
[ 6227.181321]  [<ffffffff8137406c>] ? security_sock_rcv_skb+0x1c/0x30
[ 6227.181321]  [<ffffffff81682fe7>] ? sk_filter+0x37/0xe0
[ 6227.181321]  [<ffffffff816cd84c>] tcp_v4_rcv+0x67c/0x990
[ 6227.181321]  [<ffffffff8174f852>] ? retint_restore_args+0x6/0xb
[ 6227.181321]  [<ffffffff8137406c>] ? security_sock_rcv_skb+0x1c/0x30
[ 6227.181321]  [<ffffffff816a55c3>] ip_local_deliver_finish+0xf3/0x270
[ 6227.181321]  [<ffffffff816a58ce>] ip_local_deliver+0x4e/0x90
[ 6227.181321]  [<ffffffff816a5216>] ip_rcv_finish+0x86/0x340
[ 6227.181321]  [<ffffffff816a5b43>] ip_rcv+0x233/0x390
[ 6227.181321]  [<ffffffff8166bb54>] __netif_receive_skb+0x724/0x8d0
[ 6227.181321]  [<ffffffff8166be93>] netif_receive_skb+0x23/0x80
[ 6227.181321]  [<ffffffff8166c2ac>] ? dev_gro_receive+0x1bc/0x2a0
[ 6227.181321]  [<ffffffff8166bfe8>] napi_skb_finish+0x58/0x80
[ 6227.181321]  [<ffffffff8166c6c5>] napi_gro_receive+0xf5/0x140
[ 6227.181321]  [<ffffffff8154bd64>] e1000_receive_skb+0x64/0x80
[ 6227.181321]  [<ffffffff8154e2eb>] e1000_clean_rx_irq+0x22b/0x4f0
[ 6227.181321]  [<ffffffff8154f215>] e1000_clean+0x265/0x910
[ 6227.181321]  [<ffffffff810c7616>] ? sched_slice.isra.40+0x46/0x90
[ 6227.181321]  [<ffffffff81742f61>] ? nohz_balance_exit_idle.part.44+0x16/0x43
[ 6227.181321]  [<ffffffff810cbe16>] ? trigger_load_balance+0x106/0x220
[ 6227.181321]  [<ffffffff810c2877>] ? scheduler_tick+0x107/0x160
[ 6227.181321]  [<ffffffff8166c936>] net_rx_action+0x146/0x240
[ 6227.181321]  [<ffffffff81092453>] __do_softirq+0xb3/0x1e0
[ 6227.181321]  [<ffffffff8102af28>] ? ack_apic_level+0x88/0x150
[ 6227.181321]  [<ffffffff81751a8c>] call_softirq+0x1c/0x30
[ 6227.181321]  [<ffffffff81004985>] do_softirq+0x55/0x90
[ 6227.181321]  [<ffffffff810926ed>] irq_exit+0x9d/0xb0
[ 6227.181321]  [<ffffffff817521c3>] do_IRQ+0x63/0xf0
[ 6227.181321]  [<ffffffff8174f810>] common_interrupt+0x90/0x90
[ 6227.181321]  <EOI>  [<ffffffff810c91e9>] ? dequeue_entity+0x89/0x1a0
[ 6227.181321]  [<ffffffff810c031e>] ? finish_task_switch+0x4e/0xf0
[ 6227.181321]  [<ffffffff8174dd41>] __schedule+0x331/0x800
[ 6227.181321]  [<ffffffff8174e549>] schedule+0x29/0x70
[ 6227.181321]  [<ffffffff8174c97c>] schedule_timeout+0x1fc/0x2c0
[ 6227.181321]  [<ffffffff8174e372>] wait_for_common+0xd2/0x170
[ 6227.181321]  [<ffffffff810c4100>] ? try_to_wake_up+0x290/0x290
[ 6227.181321]  [<ffffffff8174e50d>] wait_for_completion+0x1d/0x30
[ 6227.181321]  [<ffffffff810a9711>] flush_work+0xf1/0x180
[ 6227.181321]  [<ffffffff810a8910>] ? gcwq_release_assoc_and_unlock+0x50/0x50
[ 6227.181321]  [<ffffffff81180202>] ? kmem_cache_open+0x102/0x400
[ 6227.181321]  [<ffffffff8148cf05>] tty_flush_to_ldisc+0x15/0x20
[ 6227.181321]  [<ffffffff81487222>] n_tty_read+0x1e2/0x980
[ 6227.181321]  [<ffffffff810c4100>] ? try_to_wake_up+0x290/0x290
[ 6227.181321]  [<ffffffff8148210f>] tty_read+0x9f/0x110
[ 6227.181321]  [<ffffffff81188928>] vfs_read+0xd8/0x240
[ 6227.181321]  [<ffffffff81188ae2>] sys_read+0x52/0xa0
[ 6227.181321]  [<ffffffff810a0d8a>] ? sigprocmask+0x4a/0x90
[ 6227.181321]  [<ffffffff81750528>] system_call_fastpath+0x18/0x1d
[ 6228.009913] ------------[ cut here ]------------
[ 6228.010910] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x62/0x70()
[ 6228.010910] Hardware name: X6DHR-8G2
[ 6228.010910] Modules linked in: parport cdrom microcode intel_rng button
[ 6228.010910] Pid: 1103, comm: sshd Not tainted 3.7.1-grsec #5
[ 6228.010910] Call Trace:
[ 6228.010910]  <IRQ>  [<ffffffff81088b0f>] warn_slowpath_common+0x7f/0xc0
[ 6228.010910]  [<ffffffff81088b6a>] warn_slowpath_null+0x1a/0x20
[ 6228.010910]  [<ffffffff81027222>] native_smp_send_reschedule+0x62/0x70
[ 6228.010910]  [<ffffffff810cbea2>] trigger_load_balance+0x192/0x220
[ 6228.010910]  [<ffffffff810c2877>] scheduler_tick+0x107/0x160
[ 6228.010910]  [<ffffffff810d9ab0>] ? tick_nohz_handler+0x100/0x100
[ 6228.010910]  [<ffffffff8109c599>] update_process_times+0x69/0x90
[ 6228.010910]  [<ffffffff810d9b33>] tick_sched_timer+0x83/0xd0
[ 6228.010910]  [<ffffffff810b6c62>] __run_hrtimer+0x72/0x1e0
[ 6228.010910]  [<ffffffff810d9ab0>] ? tick_nohz_handler+0x100/0x100
[ 6228.010910]  [<ffffffff810b75b7>] hrtimer_interrupt+0xf7/0x220
[ 6228.010910]  [<ffffffff817522c5>] smp_apic_timer_interrupt+0x75/0xa9
[ 6228.010910]  [<ffffffff81751050>] apic_timer_interrupt+0x90/0xa0
[ 6228.010910]  [<ffffffff81742778>] ? panic+0x186/0x1c5
[ 6228.010910]  [<ffffffff8108f270>] do_exit+0x8b0/0x8e0
[ 6228.010910]  [<ffffffff810065cc>] ? show_trace_log_lvl+0x5c/0x80
[ 6228.010910]  [<ffffffff8108f74f>] do_group_exit+0x3f/0xa0
[ 6228.010910]  [<ffffffff8118dfe3>] report_size_overflow+0x33/0x40
[ 6228.010910]  [<ffffffff8165d2bd>] skb_pad+0x1bd/0x1e0
[ 6228.010910]  [<ffffffff814089f8>] ? swiotlb_dma_mapping_error+0x18/0x30
[ 6228.010910]  [<ffffffff8154ce68>] e1000_xmit_frame+0x528/0x1050
[ 6228.010910]  [<ffffffff8166d017>] dev_hard_start_xmit+0x247/0x520
[ 6228.010910]  [<ffffffff8168ff60>] sch_direct_xmit+0x100/0x1e0
[ 6228.010910]  [<ffffffff8166d6b9>] dev_queue_xmit+0x179/0x440
[ 6228.010910]  [<ffffffff816aab77>] ip_finish_output+0x2c7/0x3c0
[ 6228.010910]  [<ffffffff816ab5dc>] ip_output+0x5c/0xa0
[ 6228.010910]  [<ffffffff816aad4f>] ip_local_out+0x2f/0x40
[ 6228.010910]  [<ffffffff816aaeb1>] ip_queue_xmit+0x151/0x3d0
[ 6228.010910]  [<ffffffff8165bb8e>] ? __skb_clone+0x2e/0x130
[ 6228.010910]  [<ffffffff816c54f3>] tcp_transmit_skb+0x3f3/0x920
[ 6228.010910]  [<ffffffff816c5c03>] tcp_write_xmit+0x123/0xa70
[ 6228.010910]  [<ffffffff816c65c2>] __tcp_push_pending_frames+0x32/0xa0
[ 6228.010910]  [<ffffffff816c1e9d>] tcp_rcv_established+0x32d/0x9f0
[ 6228.010910]  [<ffffffff810c148c>] ? ttwu_do_wakeup+0x2c/0xf0
[ 6228.010910]  [<ffffffff816cb5e1>] tcp_v4_do_rcv+0xe1/0x350
[ 6228.010910]  [<ffffffff8137406c>] ? security_sock_rcv_skb+0x1c/0x30
[ 6228.010910]  [<ffffffff81682fe7>] ? sk_filter+0x37/0xe0
[ 6228.010910]  [<ffffffff816cd84c>] tcp_v4_rcv+0x67c/0x990
[ 6228.010910]  [<ffffffff8174f852>] ? retint_restore_args+0x6/0xb
[ 6228.010910]  [<ffffffff8137406c>] ? security_sock_rcv_skb+0x1c/0x30
[ 6228.010910]  [<ffffffff816a55c3>] ip_local_deliver_finish+0xf3/0x270
[ 6228.010910]  [<ffffffff816a58ce>] ip_local_deliver+0x4e/0x90
[ 6228.010910]  [<ffffffff816a5216>] ip_rcv_finish+0x86/0x340
[ 6228.010910]  [<ffffffff816a5b43>] ip_rcv+0x233/0x390
[ 6228.010910]  [<ffffffff8166bb54>] __netif_receive_skb+0x724/0x8d0
[ 6228.010910]  [<ffffffff8166be93>] netif_receive_skb+0x23/0x80
[ 6228.010910]  [<ffffffff8166c2ac>] ? dev_gro_receive+0x1bc/0x2a0
[ 6228.010910]  [<ffffffff8166bfe8>] napi_skb_finish+0x58/0x80
[ 6228.010910]  [<ffffffff8166c6c5>] napi_gro_receive+0xf5/0x140
[ 6228.010910]  [<ffffffff8154bd64>] e1000_receive_skb+0x64/0x80
[ 6228.010910]  [<ffffffff8154e2eb>] e1000_clean_rx_irq+0x22b/0x4f0
[ 6228.010910]  [<ffffffff8154f215>] e1000_clean+0x265/0x910
[ 6228.010910]  [<ffffffff810c7616>] ? sched_slice.isra.40+0x46/0x90
[ 6228.010910]  [<ffffffff81742f61>] ? nohz_balance_exit_idle.part.44+0x16/0x43
[ 6228.010910]  [<ffffffff810cbe16>] ? trigger_load_balance+0x106/0x220
[ 6228.010910]  [<ffffffff810c2877>] ? scheduler_tick+0x107/0x160
[ 6228.010910]  [<ffffffff8166c936>] net_rx_action+0x146/0x240
[ 6228.010910]  [<ffffffff81092453>] __do_softirq+0xb3/0x1e0
[ 6228.010910]  [<ffffffff8102af28>] ? ack_apic_level+0x88/0x150
[ 6228.010910]  [<ffffffff81751a8c>] call_softirq+0x1c/0x30
[ 6228.010910]  [<ffffffff81004985>] do_softirq+0x55/0x90
[ 6228.010910]  [<ffffffff810926ed>] irq_exit+0x9d/0xb0
[ 6228.010910]  [<ffffffff817521c3>] do_IRQ+0x63/0xf0
[ 6228.010910]  [<ffffffff8174f810>] common_interrupt+0x90/0x90
[ 6228.010910]  <EOI>  [<ffffffff810c91e9>] ? dequeue_entity+0x89/0x1a0
[ 6228.010910]  [<ffffffff810c031e>] ? finish_task_switch+0x4e/0xf0
[ 6228.010910]  [<ffffffff8174dd41>] __schedule+0x331/0x800
[ 6228.010910]  [<ffffffff8174e549>] schedule+0x29/0x70
[ 6228.010910]  [<ffffffff8174c97c>] schedule_timeout+0x1fc/0x2c0
[ 6228.010910]  [<ffffffff8174e372>] wait_for_common+0xd2/0x170
[ 6228.010910]  [<ffffffff810c4100>] ? try_to_wake_up+0x290/0x290
[ 6228.010910]  [<ffffffff8174e50d>] wait_for_completion+0x1d/0x30
[ 6228.010910]  [<ffffffff810a9711>] flush_work+0xf1/0x180
[ 6228.010910]  [<ffffffff810a8910>] ? gcwq_release_assoc_and_unlock+0x50/0x50
[ 6228.010910]  [<ffffffff81180202>] ? kmem_cache_open+0x102/0x400
[ 6228.010910]  [<ffffffff8148cf05>] tty_flush_to_ldisc+0x15/0x20
[ 6228.010910]  [<ffffffff81487222>] n_tty_read+0x1e2/0x980
[ 6228.010910]  [<ffffffff810c4100>] ? try_to_wake_up+0x290/0x290
[ 6228.010910]  [<ffffffff8148210f>] tty_read+0x9f/0x110
[ 6228.010910]  [<ffffffff81188928>] vfs_read+0xd8/0x240
[ 6228.010910]  [<ffffffff81188ae2>] sys_read+0x52/0xa0
[ 6228.010910]  [<ffffffff810a0d8a>] ? sigprocmask+0x4a/0x90
[ 6228.010910]  [<ffffffff81750528>] system_call_fastpath+0x18/0x1d
[ 6228.010910] ---[ end trace e3d42ca97269f47b ]---


Addendum: Here are the current configs + kernels:

vmlinux: http://www.mediafire.com/?ii64637y65kyz85
System.map: http://www.mediafire.com/?28i3b2g8lxo7e8q
.config: http://www.mediafire.com/?z6gtub334azgj5u

Verified crash dump example:
Code: Select all
Welcome to Linux 3.7.1-grsec (ttyS0)

[  111.032976] Kernel panic - not syncing: Aiee, killing interrupt handler!
[  111.033834] Pid: 0, comm: swapper/1 Not tainted 3.7.1-grsec #5
[  111.033834] Call Trace:
[  111.033834]  <IRQ>  [<ffffffff817356c5>] panic+0xbb/0x1c5
[  111.033834]  [<ffffffff8108f270>] do_exit+0x8b0/0x8e0
[  111.033834]  [<ffffffff810065cc>] ? show_trace_log_lvl+0x5c/0x80
[  111.033834]  [<ffffffff8108f74f>] do_group_exit+0x3f/0xa0
[  111.033834]  [<ffffffff8118dfe3>] report_size_overflow+0x33/0x40
[  111.033834]  [<ffffffff8164fabd>] skb_pad+0x1bd/0x1e0
[  111.033834]  [<ffffffff81544608>] e1000_xmit_frame+0x528/0x1050
[  111.033834]  [<ffffffff8165f837>] dev_hard_start_xmit+0x247/0x520
[  111.033834]  [<ffffffff816827c0>] sch_direct_xmit+0x100/0x1e0
[  111.033834]  [<ffffffff8168294f>] __qdisc_run+0xaf/0x140
[  111.033834]  [<ffffffff8165c771>] net_tx_action+0xf1/0x1b0
[  111.033834]  [<ffffffff81092453>] __do_softirq+0xb3/0x1e0
[  111.033834]  [<ffffffff8102af28>] ? ack_apic_level+0x88/0x150
[  111.033834]  [<ffffffff8174498c>] call_softirq+0x1c/0x30
[  111.033834]  [<ffffffff81004985>] do_softirq+0x55/0x90
[  111.033834]  [<ffffffff810926ed>] irq_exit+0x9d/0xb0
[  111.033834]  [<ffffffff817450c3>] do_IRQ+0x63/0xf0
[  111.033834]  [<ffffffff81742710>] common_interrupt+0x90/0x90
[  111.033834]  <EOI>  [<ffffffff81742752>] ? retint_restore_args+0x6/0xb
[  111.033834]  [<ffffffff8100c045>] ? mwait_idle+0x85/0x1d0
[  111.033834]  [<ffffffff8100ca3c>] cpu_idle+0xfc/0x120
[  111.286087]  [<ffffffff817300d4>] start_secondary+0x1e5/0x1f0


Series of build commands:
Code: Select all
tar axf /root/linux-3.7.1grsec.tar.xz
cp /root/config-very-last-with-modules linux/.config
cp /root/size_overflow_plugin-20130102.c linux/tools/gcc/size_overflow_plugin.c
cd linux
ls -l tools/gcc/size_overflow_plugin.c
  -rw-r--r-- 1 root root   53348 Jan  4 08:39 size_overflow_plugin.c
make clean ; make -j3 && make -j3 modules_install && make bzImage

Re: Semi-random kernel panics on 3.7.1+201212271953

PostPosted: Fri Jan 04, 2013 7:44 pm
by ephox
I hope I fixed the bug :) Here is the new version:
http://www.grsecurity.net/~ephox/overfl ... 20130104.c
Could you test it please?

If the bug wasn't fixed, then could you do the following please:
* send me vmlinux and backtrace
* rm -f net/core/skbuff.o; make net/core/skbuff.o EXTRA_CFLAGS=-fdump-tree-all, and send me all the net/core/skbuff.c.* files
* make net/core/skbuff.s, and send me the net/core/skbuff.s
You can send these files in e-mail ;)

Re: Semi-random kernel panics on 3.7.1+201212271953

PostPosted: Sat Jan 05, 2013 6:13 pm
by Neokernsec
Still not fixed, alas.

vmlinux, System.map, and config: http://www.mediafire.com/?bnon8e96gax66

I'll generate the additional files you've requested shortly. The above files are the "stock" ones WITHOUT any EXTRA_CFLAGS.

Stacktrace:

Code: Select all
Welcome to Linux 3.7.1-grsec (ttyS0)

isengard login: INITIRunning shutdown script /etc/rc.d/rc.6:
Saving system time to the hardware clock (UTC).
[  262.596846] grsec: time set by /sbin/hwclock[hwclock:1215] uid/euid:0/0 gid/egid:0/0, parent /etc/rc.d/rc.6[rc.6:1210
] uid/euid:0/0 gid/egid:0/0
[  262.647137] Kernel panic - not syncing: Aiee, killing interrupt handler!
Unmounting remot[  262.655639] Pid: 1140, comm: sshd Not tainted 3.7.1-grsec #5
e filesystems.
[  262.674693] Call Trace:
[  262.674693]  [<ffffffff81736c95>] panic+0xbb/0x1c5
[  262.685428]  [<ffffffff8108f390>] do_exit+0x8b0/0x8e0
[  262.685428]  [<ffffffff810065bc>] ? show_trace_log_lvl+0x5c/0x80
[  262.685428]  [<ffffffff8108f86f>] do_group_exit+0x3f/0xa0
[  262.685428]  [<ffffffff8118e263>] report_size_overflow+0x33/0x40
[  262.685428]  [<ffffffff81650efd>] skb_pad+0x1dd/0x200
[  262.685428]  [<ffffffff81286ade>] ? do_get_write_access+0x3be/0x4b0
[  262.685428]  [<ffffffff81546518>] e1000_xmit_frame+0x528/0x1000
[  262.685428]  [<ffffffff811c1f2d>] ? __getblk+0x2d/0x2e0
[  262.685428]  [<ffffffff81286ade>] ? do_get_write_access+0x3be/0x4b0
[  262.685428]  [<ffffffff81660d17>] dev_hard_start_xmit+0x247/0x520
[  262.685428]  [<ffffffff81683f60>] sch_direct_xmit+0x100/0x1e0
[  262.685428]  [<ffffffff816613b9>] dev_queue_xmit+0x179/0x440
[  262.685428]  [<ffffffff8169ec37>] ip_finish_output+0x2c7/0x3c0
[  262.685428]  [<ffffffff8169f69c>] ip_output+0x5c/0xa0
[  262.685428]  [<ffffffff8169ee0f>] ip_local_out+0x2f/0x40
[  262.685428]  [<ffffffff8169ef71>] ip_queue_xmit+0x151/0x3d0
[  262.685428]  [<ffffffff8164f7ee>] ? __skb_clone+0x2e/0x130
[  262.685428]  [<ffffffff816b9383>] tcp_transmit_skb+0x3f3/0x920
[  262.685428]  [<ffffffff816b9a93>] tcp_write_xmit+0x123/0xa70
[  262.685428]  [<ffffffff816ba452>] __tcp_push_pending_frames+0x32/0xa0
[  262.685428]  [<ffffffff816bb0d3>] tcp_send_fin+0x83/0x1d0
[  262.685428]  [<ffffffff816add48>] tcp_close+0x398/0x430
[  262.685428]  [<ffffffff816d22aa>] inet_release+0x7a/0x90
[  262.685428]  [<ffffffff816420de>] sock_release+0x2e/0xa0
[  262.685428]  [<ffffffff816424c7>] sock_close+0x17/0x30
[  262.685428]  [<ffffffff8118a00c>] __fput+0xec/0x250
[  262.685428]  [<ffffffff8118a17e>] ____fput+0xe/0x20
[  262.685428]  [<ffffffff810adfd8>] task_work_run+0xb8/0xe0
[  262.685428]  [<ffffffff8108ec84>] do_exit+0x1a4/0x8e0
[  262.685428]  [<ffffffff811aadcf>] ? mnt_drop_write+0x1f/0x30
[  262.685428]  [<ffffffff811864bc>] ? chmod_common+0x8c/0xe0
[  262.685428]  [<ffffffff8108f86f>] do_group_exit+0x3f/0xa0
[  262.685428]  [<ffffffff8108f8e7>] sys_exit_group+0x17/0x20
[  262.685428]  [<ffffffff81744b68>] system_call_fastpath+0x18/0x1d
[  262.685428]  [<ffffffff81186329>] ? filp_close+0x69/0xa0
[  262.685428]  [<ffffffff811a94b6>] ? __close_fd+0x76/0x90
[  262.685428]  [<ffffffff81744b8e>] ? sysret_check+0x1c/0x58


I had three ssh sessions open: 1) cat /usr/src/linux/*/*/*.c 2) top and 3) shutdown -r now

Addendum: skbuff files sent to you via PM.

Re: Semi-random kernel panics on 3.7.1+201212271953

PostPosted: Sun Jan 06, 2013 11:43 am
by ephox
Hi,

I forgot this patch:
Code: Select all
--- net/core/skbuff.c.orig      2013-01-06 16:18:32.532651931 +0100
+++ net/core/skbuff.c   2013-01-06 16:37:07.700676720 +0100
@@ -1197,6 +1197,8 @@
                return 0;
        }
 
+       if (pad < 0)
+               printk(KERN_ERR "data_len: %x pad: %x end: %x tail: %x\n", skb->data_len, pad, skb->end, skb->tail);
        ntail = skb->data_len + pad - (skb->end - skb->tail);
        if (likely(skb_cloned(skb) || ntail > 0)) {
                err = pskb_expand_head(skb, 0, ntail, GFP_ATOMIC);


Could you apply it and send me the printk result? If you don't see any results just apply this patch without the "if".

Re: Semi-random kernel panics on 3.7.1+201212271953

PostPosted: Sun Jan 06, 2013 7:21 pm
by Neokernsec
ephox wrote:Could you apply it and send me the printk result? If you don't see any results just apply this patch without the "if".
Try #1: crash, but no KERN_ERR output. Removing the conditional test in skbuff.c and retesting.

Stackdump:

Code: Select all
INITRunning shutdown script /etc/rc.d/rc.6:
Saving system time to the hardware clock (UTC).
[ 2355.995956] grsec: time set by /sbin/hwclock[hwclock:4357] uid/euid:0/0 gid/egid:0/0, parent /etc/rc.d/rc.6[rc.6:4350
[ 2356.039295] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 2356.040010] Pid: 1152, comm: sshd Not tainted 3.7.1-grsec #6
[ 2356.040010] Call Trace:
[ 2356.040010]  [<ffffffff81736cd5>] panic+0xbb/0x1c5
[ 2356.040010]  [<ffffffff8108f390>] do_exit+0x8b0/0x8e0
[ 2356.040010]  [<ffffffff810065bc>] ? show_trace_log_lvl+0x5c/0x80
[ 2356.040010]  [<ffffffff8108f86f>] do_group_exit+0x3f/0xa0
[ 2356.040010]  [<ffffffff8118e263>] report_size_overflow+0x33/0x40
[ 2356.040010]  [<ffffffff81650ecf>] skb_pad+0x1af/0x240
[ 2356.040010]  [<ffffffff810bf523>] ? __wake_up+0x53/0x70
[ 2356.040010]  [<ffffffff81546518>] e1000_xmit_frame+0x528/0x1000
[ 2356.040010]  [<ffffffff810b27fe>] ? wake_up_bit+0x2e/0x40
[ 2356.040010]  [<ffffffff81286ade>] ? do_get_write_access+0x3be/0x4b0
[ 2356.040010]  [<ffffffff811bfc70>] ? __find_get_block_slow+0xd0/0x190
[ 2356.040010]  [<ffffffff81660d57>] dev_hard_start_xmit+0x247/0x520
[ 2356.040010]  [<ffffffff81683fa0>] sch_direct_xmit+0x100/0x1e0
[ 2356.040010]  [<ffffffff816613f9>] dev_queue_xmit+0x179/0x440
[ 2356.040010]  [<ffffffff81670a8a>] neigh_resolve_output+0x12a/0x210
[ 2356.040010]  [<ffffffff8169eb75>] ip_finish_output+0x1c5/0x3c0
[ 2356.040010]  [<ffffffff8169f6dc>] ip_output+0x5c/0xa0
[ 2356.040010]  [<ffffffff8169ee4f>] ip_local_out+0x2f/0x40
[ 2356.040010]  [<ffffffff8169efb1>] ip_queue_xmit+0x151/0x3d0
[ 2356.040010]  [<ffffffff8164f7ee>] ? __skb_clone+0x2e/0x130
[ 2356.040010]  [<ffffffff816b93c3>] tcp_transmit_skb+0x3f3/0x920
[ 2356.040010]  [<ffffffff816b9ad3>] tcp_write_xmit+0x123/0xa70
[ 2356.040010]  [<ffffffff816ba492>] __tcp_push_pending_frames+0x32/0xa0
[ 2356.040010]  [<ffffffff816bb113>] tcp_send_fin+0x83/0x1d0
[ 2356.040010]  [<ffffffff816add88>] tcp_close+0x398/0x430
[ 2356.040010]  [<ffffffff816d22ea>] inet_release+0x7a/0x90
[ 2356.040010]  [<ffffffff816420de>] sock_release+0x2e/0xa0
[ 2356.040010]  [<ffffffff816424c7>] sock_close+0x17/0x30
[ 2356.040010]  [<ffffffff8118a00c>] __fput+0xec/0x250
[ 2356.040010]  [<ffffffff8118a17e>] ____fput+0xe/0x20
[ 2356.402315]  [<ffffffff810adfd8>] task_work_run+0xb8/0xe0
[ 2356.406756]  [<ffffffff8108ec84>] do_exit+0x1a4/0x8e0
[ 2356.406756]  [<ffffffff81186329>] ? filp_close+0x69/0xa0
[ 2356.406756]  [<ffffffff811a94b6>] ? __close_fd+0x76/0x90
[ 2356.406756]  [<ffffffff8108f86f>] do_group_exit+0x3f/0xa0
[ 2356.406756]  [<ffffffff8108f8e7>] sys_exit_group+0x17/0x20
[ 2356.406756]  [<ffffffff81744ba8>] system_call_fastpath+0x18/0x1d
Unmounting remot


Try #2:

Preparation & validation:
Code: Select all
/usr/src/linux: $ grep -3 KERN_ERR net/core/skbuff.c
        }


        printk(KERN_ERR "data_len: %x pad: %x end: %x tail: %x\n", skb->data_len, pad, skb->end, skb->tail);
        ntail = skb->data_len + pad - (skb->end - skb->tail);
        if (likely(skb_cloned(skb) || ntail > 0)) {
                err = pskb_expand_head(skb, 0, ntail, GFP_ATOMIC);

#
/usr/src/linux: $ grep -i SIZE .config
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_BLK_DEV_RAM_SIZE=16384
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
# CONFIG_SQUASHFS_4K_DEVBLK_SIZE is not set
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
CONFIG_TASK_SIZE_MAX_SHIFT=42
CONFIG_PAX_SIZE_OVERFLOW=y


make -j3 && make -j3 modules_install && make -j2 bzImage

Kernel, config, etc: http://www.mediafire.com/?c0btxf0reqs3e

Stacktrace:
Code: Select all
isengard login: [  316.736175] Kernel panic - not syncing: Aiee, killing interrupt handler!
[  316.737005] Pid: 1176, comm: lynx Not tainted 3.7.1-grsec #7
[  316.737005] Call Trace:
[  316.737005]  [<ffffffff81736ca5>] panic+0xbb/0x1c5
[  316.737005]  [<ffffffff8108f390>] do_exit+0x8b0/0x8e0
[  316.737005]  [<ffffffff810065bc>] ? show_trace_log_lvl+0x5c/0x80
[  316.737005]  [<ffffffff8108f86f>] do_group_exit+0x3f/0xa0
[  316.803004]  [<ffffffff8118e263>] report_size_overflow+0x33/0x40
[  316.803004]  [<ffffffff81650f02>] skb_pad+0x1e2/0x210
[  316.803004]  [<ffffffff81546518>] e1000_xmit_frame+0x528/0x1000
[  316.803004]  [<ffffffff810bf523>] ? __wake_up+0x53/0x70
[  316.803004]  [<ffffffff8148c1c4>] ? put_ldisc+0x44/0xb0
[  316.803004]  [<ffffffff81660d27>] dev_hard_start_xmit+0x247/0x520
[  316.803004]  [<ffffffff81683f70>] sch_direct_xmit+0x100/0x1e0
[  316.803004]  [<ffffffff816613c9>] dev_queue_xmit+0x179/0x440
[  316.803004]  [<ffffffff8169ec47>] ip_finish_output+0x2c7/0x3c0
[  316.803004]  [<ffffffff8169f6ac>] ip_output+0x5c/0xa0
[  316.803004]  [<ffffffff810c16b2>] ? check_preempt_curr+0x92/0xb0
[  316.803004]  [<ffffffff8169ee1f>] ip_local_out+0x2f/0x40
[  316.803004]  [<ffffffff8169ef81>] ip_queue_xmit+0x151/0x3d0
[  316.803004]  [<ffffffff8164f7ee>] ? __skb_clone+0x2e/0x130
[  316.803004]  [<ffffffff816b9393>] tcp_transmit_skb+0x3f3/0x920
[  316.803004]  [<ffffffff810c9449>] ? dequeue_entity+0x89/0x1a0
[  316.803004]  [<ffffffff816b9aa3>] tcp_write_xmit+0x123/0xa70
[  316.803004]  [<ffffffff816ba462>] __tcp_push_pending_frames+0x32/0xa0
[  316.803004]  [<ffffffff816bb0e3>] tcp_send_fin+0x83/0x1d0
[  316.803004]  [<ffffffff816add58>] tcp_close+0x398/0x430
[  316.803004]  [<ffffffff816d22ba>] inet_release+0x7a/0x90
[  316.803004]  [<ffffffff816420de>] sock_release+0x2e/0xa0
[  316.803004]  [<ffffffff816424c7>] sock_close+0x17/0x30
[  316.803004]  [<ffffffff8118a00c>] __fput+0xec/0x250
[  316.803004]  [<ffffffff8118a17e>] ____fput+0xe/0x20
[  316.803004]  [<ffffffff810adfd8>] task_work_run+0xb8/0xe0
[  316.803004]  [<ffffffff81002a2e>] do_notify_resume+0x5e/0x80
[  316.803004]  [<ffffffff81744e1a>] int_signal+0x12/0x17

Re: Semi-random kernel panics on 3.7.1+201212271953

PostPosted: Tue Jan 08, 2013 5:18 pm
by ephox
This patch will surely print out my message :) Could you apply it and trigger this panic a few times please?
Code: Select all
--- net/core/skbuff.c.orig      2013-01-06 16:18:32.532651931 +0100
+++ net/core/skbuff.c   2013-01-08 21:47:52.925233348 +0100
@@ -1186,6 +1186,11 @@
  *     May return error in out of memory cases. The skb is freed on error.
  */
 
+unsigned int ephox_data_len;
+unsigned int ephox_end;
+unsigned int ephox_tail;
+int ephox_pad;
+
 int skb_pad(struct sk_buff *skb, int pad)
 {
        int err;
@@ -1197,6 +1202,10 @@
                return 0;
        }
 
+       ephox_data_len = skb->data_len;
+       ephox_end = skb->end;
+       ephox_tail = skb->tail;
+       ephox_pad = pad;
        ntail = skb->data_len + pad - (skb->end - skb->tail);
        if (likely(skb_cloned(skb) || ntail > 0)) {
                err = pskb_expand_head(skb, 0, ntail, GFP_ATOMIC);
--- kernel/panic.c.orig 2013-01-08 21:52:17.649239233 +0100
+++ kernel/panic.c      2013-01-08 21:56:18.753244592 +0100
@@ -66,6 +66,12 @@
  *
  *     This function never returns.
  */
+
+extern unsigned int ephox_data_len;
+extern unsigned int ephox_end;
+extern unsigned int ephox_tail;
+extern int ephox_pad;
+
 void panic(const char *fmt, ...)
 {
        static DEFINE_SPINLOCK(panic_lock);
@@ -100,6 +106,7 @@
        va_start(args, fmt);
        vsnprintf(buf, sizeof(buf), fmt, args);
        va_end(args);
+       printk(KERN_EMERG "data_len: %x pad: %x end: %x tail: %x\n", ephox_data_len, ephox_pad, ephox_end, ephox_tail);
        printk(KERN_EMERG "Kernel panic - not syncing: %s\n",buf);
 #ifdef CONFIG_DEBUG_BUGVERBOSE
        /*

Re: Semi-random kernel panics on 3.7.1+201212271953

PostPosted: Wed Jan 09, 2013 7:02 am
by Neokernsec
Okay.

kernel + other files: http://www.mediafire.com/?anmwhji8hfhno

Yikes, took a while to trigger a "normal", i.e., non-shutdown triggered panic this time. I needed to do a gang-bang of cat /usr/include/*/*/* with make -j2 clean && make -j3 && make -j2 clean && make -j3 in another session, with "top" in a third ssh session.

Stackdump #1:
Code: Select all
isengard login:
[ 4315.158978] data_len: 4 pad: 2 end: 2c0 tail: d0
[ 4315.159016] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 4315.159016] Pid: 20648, comm: cc1 Not tainted 3.7.1-grsec #8
[ 4315.159016] Call Trace:
[ 4315.159016]  <IRQ>  [<ffffffff81736cdc>] panic+0xe2/0x1ec
[ 4315.159016]  [<ffffffff8108f390>] do_exit+0x8b0/0x8e0
[ 4315.159016]  [<ffffffff810065bc>] ? show_trace_log_lvl+0x5c/0x80
[ 4315.159016]  [<ffffffff8108f86f>] do_group_exit+0x3f/0xa0
[ 4315.159016]  [<ffffffff8118e263>] report_size_overflow+0x33/0x40
[ 4315.159016]  [<ffffffff81650f1d>] skb_pad+0x1fd/0x220
[ 4315.159016]  [<ffffffff814097b8>] ? swiotlb_dma_mapping_error+0x18/0x30
[ 4315.159016]  [<ffffffff81546518>] e1000_xmit_frame+0x528/0x1000
[ 4315.159016]  [<ffffffff81660d37>] dev_hard_start_xmit+0x247/0x520
[ 4315.159016]  [<ffffffff81683f80>] sch_direct_xmit+0x100/0x1e0
[ 4315.159016]  [<ffffffff816613d9>] dev_queue_xmit+0x179/0x440
[ 4315.159016]  [<ffffffff8169ec57>] ip_finish_output+0x2c7/0x3c0
[ 4315.159016]  [<ffffffff8169f6bc>] ip_output+0x5c/0xa0
[ 4315.159016]  [<ffffffff8169ee2f>] ip_local_out+0x2f/0x40
[ 4315.159016]  [<ffffffff8169ef91>] ip_queue_xmit+0x151/0x3d0
[ 4315.159016]  [<ffffffff8164f7ee>] ? __skb_clone+0x2e/0x130
[ 4315.159016]  [<ffffffff816b93a3>] tcp_transmit_skb+0x3f3/0x920
[ 4315.159016]  [<ffffffff816b9ab3>] tcp_write_xmit+0x123/0xa70
[ 4315.159016]  [<ffffffff816ba472>] __tcp_push_pending_frames+0x32/0xa0
[ 4315.159016]  [<ffffffff816b5d6d>] tcp_rcv_established+0x32d/0x9d0
[ 4315.159016]  [<ffffffff810c9f98>] ? enqueue_task_fair+0xa8/0xf0
[ 4315.159016]  [<ffffffff816bf421>] tcp_v4_do_rcv+0xe1/0x350
[ 4315.159016]  [<ffffffff81374c8c>] ? security_sock_rcv_skb+0x1c/0x30
[ 4315.159016]  [<ffffffff81676f87>] ? sk_filter+0x37/0xe0
[ 4315.159016]  [<ffffffff816c167c>] tcp_v4_rcv+0x67c/0x980
[ 4315.159016]  [<ffffffff810927d6>] ? irq_exit+0x66/0xb0
[ 4315.488744]  [<ffffffff81699683>] ip_local_deliver_finish+0xf3/0x270
[ 4315.488744]  [<ffffffff8169998e>] ip_local_deliver+0x4e/0x90
[ 4315.488744]  [<ffffffff816992d6>] ip_rcv_finish+0x86/0x340
[ 4315.488744]  [<ffffffff81699c03>] ip_rcv+0x233/0x380
[ 4315.488744]  [<ffffffff8165f844>] __netif_receive_skb+0x724/0x8d0
[ 4315.488744]  [<ffffffff8165fb83>] netif_receive_skb+0x23/0x80
[ 4315.488744]  [<ffffffff8165ff9c>] ? dev_gro_receive+0x1bc/0x2a0
[ 4315.488744]  [<ffffffff8165fcd8>] napi_skb_finish+0x58/0x80
[ 4315.488744]  [<ffffffff816603c5>] napi_gro_receive+0xf5/0x140
[ 4315.488744]  [<ffffffff81544584>] e1000_receive_skb+0x64/0x80
[ 4315.488744]  [<ffffffff81545b3b>] e1000_clean_rx_irq+0x22b/0x4e0
[ 4315.488744]  [<ffffffff81547a75>] e1000_clean+0x265/0x910
[ 4315.488744]  [<ffffffff810c42da>] ? try_to_wake_up+0x20a/0x290
[ 4315.488744]  [<ffffffff8109a910>] ? usleep_range+0x50/0x50
[ 4315.488744]  [<ffffffff810c2ae7>] ? scheduler_tick+0x107/0x160
[ 4315.488744]  [<ffffffff81660636>] net_rx_action+0x146/0x240
[ 4315.488744]  [<ffffffff81092573>] __do_softirq+0xb3/0x1e0
[ 4315.488744]  [<ffffffff8102af18>] ? ack_apic_level+0x88/0x150
[ 4315.488744]  [<ffffffff8174610c>] call_softirq+0x1c/0x30
[ 4315.488744]  [<ffffffff81004985>] do_softirq+0x55/0x90
[ 4315.488744]  [<ffffffff8109280d>] irq_exit+0x9d/0xb0
[ 4315.488744]  [<ffffffff81746843>] do_IRQ+0x63/0xf0
[ 4315.488744]  [<ffffffff81743e90>] common_interrupt+0x90/0x90
[ 4315.488744]  <EOI>


Stackdump #2: (normal shutdown -r now from within ssh session)

Code: Select all
Welcome to Linux 3.7.1-grsec (ttyS0)

INITRunning shutdown script /etc/rc.d/rc.6:
Saving system time to the hardware clock (UTC).
[  653.222664] grsec: time set by /sbin/hwclock[hwclock:1269] uid/euid:0/0 gid/egid:0/0, parent /etc/rc.d/rc.6[rc.6:1260
[  653.269303] data_len: 0 pad: 6 end: 2c0 tail: d0
Unmounting remot[  653.270008] Kernel panic - not syncing: Aiee, killing interrupt handler!
e filesystems.
[  653.270008] Pid: 1225, comm: sshd Not tainted 3.7.1-grsec #8
[  653.270008] Call Trace:
[  653.270008]  [<ffffffff81736cdc>] panic+0xe2/0x1ec
[  653.270008]  [<ffffffff8108f390>] do_exit+0x8b0/0x8e0
[  653.270008]  [<ffffffff810065bc>] ? show_trace_log_lvl+0x5c/0x80
[  653.270008]  [<ffffffff8108f86f>] do_group_exit+0x3f/0xa0
[  653.270008]  [<ffffffff8118e263>] report_size_overflow+0x33/0x40
[  653.270008]  [<ffffffff81650f1d>] skb_pad+0x1fd/0x220
[  653.270008]  [<ffffffff81546518>] e1000_xmit_frame+0x528/0x1000
[  653.270008]  [<ffffffff8122b4fd>] ? __ext3_journal_dirty_metadata+0x2d/0x70
[  653.270008]  [<ffffffff810b27fe>] ? wake_up_bit+0x2e/0x40
[  653.270008]  [<ffffffff81660d37>] dev_hard_start_xmit+0x247/0x520
[  653.270008]  [<ffffffff81683f80>] sch_direct_xmit+0x100/0x1e0
[  653.270008]  [<ffffffff816613d9>] dev_queue_xmit+0x179/0x440
[  653.270008]  [<ffffffff8169ec57>] ip_finish_output+0x2c7/0x3c0
[  653.270008]  [<ffffffff8169f6bc>] ip_output+0x5c/0xa0
[  653.270008]  [<ffffffff8169ee2f>] ip_local_out+0x2f/0x40
[  653.270008]  [<ffffffff8169ef91>] ip_queue_xmit+0x151/0x3d0
[  653.270008]  [<ffffffff8164f7ee>] ? __skb_clone+0x2e/0x130
[  653.270008]  [<ffffffff816b93a3>] tcp_transmit_skb+0x3f3/0x920
[  653.270008]  [<ffffffff816b9ab3>] tcp_write_xmit+0x123/0xa70
[  653.270008]  [<ffffffff816ba472>] __tcp_push_pending_frames+0x32/0xa0
[  653.270008]  [<ffffffff816bb0f3>] tcp_send_fin+0x83/0x1d0
[  653.270008]  [<ffffffff816add68>] tcp_close+0x398/0x430
[  653.270008]  [<ffffffff816d22ca>] inet_release+0x7a/0x90
[  653.270008]  [<ffffffff816420de>] sock_release+0x2e/0xa0
[  653.270008]  [<ffffffff816424c7>] sock_close+0x17/0x30
[  653.270008]  [<ffffffff8118a00c>] __fput+0xec/0x250
[  653.270008]  [<ffffffff8118a17e>] ____fput+0xe/0x20
[  653.270008]  [<ffffffff810adfd8>] task_work_run+0xb8/0xe0
[  653.270008]  [<ffffffff8108ec84>] do_exit+0x1a4/0x8e0
[  653.270008]  [<ffffffff811a94b6>] ? __close_fd+0x76/0x90
[  653.270008]  [<ffffffff81744bce>] ? sysret_check+0x1c/0x58
[  653.270008]  [<ffffffff8108f86f>] do_group_exit+0x3f/0xa0
[  653.270008]  [<ffffffff8108f8e7>] sys_exit_group+0x17/0x20
[  653.270008]  [<ffffffff81744ba8>] system_call_fastpath+0x18/0x1d