Page 1 of 1

Kernel Panic (report_size_overflow) in __sk_mem_schedule

PostPosted: Sat Jun 28, 2014 12:57 pm
by giact
Hello!

I am running Gentoo with hardened-sources-3.14.5-r2 (and64) and, every now and then, my kernel dies.

Here is the kernel panic screen picture: https://imgur.com/LrdRUT8

And here is my kernel config, for completeness: https://gist.github.com/giact/b1f3baa6477c561411c7

I am not entirely sure what I am doing, but after learning a bit of gdb, I got to this:

Code: Select all
   0xffffffff81342acf <+310>:   add    %r8,%rcx
   0xffffffff81342ad2 <+313>:   cmp    $0x7fffffff,%rcx
   0xffffffff81342ad9 <+320>:   jle    0xffffffff81342ae4 <__sk_mem_schedule+331>
   0xffffffff81342adb <+322>:   mov    $0xffffffff815a3fc2,%rcx
   0xffffffff81342ae2 <+329>:   jmp    0xffffffff81342af4 <__sk_mem_schedule+347>
   0xffffffff81342ae4 <+331>:   cmp    $0xffffffff80000000,%rcx
   0xffffffff81342aeb <+338>:   jge    0xffffffff81342b0c <__sk_mem_schedule+371>
   0xffffffff81342aed <+340>:   mov    $0xffffffff815a4004,%rcx
   0xffffffff81342af4 <+347>:   mov    $0xffffffff815a3fdf,%rdx
   0xffffffff81342afb <+354>:   mov    $0x583,%esi
   0xffffffff81342b00 <+359>:   mov    $0xffffffff815a3ff1,%rdi
   0xffffffff81342b07 <+366>:   callq  0xffffffff8110a345 <report_size_overflow>
   0xffffffff81342b0c <+371>:   add    $0xfff,%ecx


So I supposed all the overflow checks start at 0xffffffff81342acf
Then I did this:

Code: Select all
(gdb) l *0xffffffff81342acf
0xffffffff81342acf is in __sk_mem_schedule (net/core/sock.c:2014).
2009
2010                    if (!sk_under_memory_pressure(sk))
2011                            return 1;
2012                    alloc = sk_sockets_allocated_read_positive(sk);
2013                    if (sk_prot_mem_limits(sk, 2) > alloc *
2014                        sk_mem_pages(sk->sk_wmem_queued +
2015                                     atomic_read(&sk->sk_rmem_alloc) +
2016                                     sk->sk_forward_alloc))
2017                            return 1;
2018            }


So, I think the problem is there somehow.

Can anyone help me out?
Thanks in advance!

Re: Kernel Panic (report_size_overflow) in __sk_mem_schedule

PostPosted: Sat Jun 28, 2014 7:00 pm
by ephox
Hi,

Could you please send me the whole report_size_overflow message (It starts with "PAX: size overflow detected in function") and the results of this patch:
Code: Select all
--- net/core/sock.c.orig        2014-06-29 00:23:09.275963142 +0200
+++ net/core/sock.c     2014-06-29 00:39:58.087976745 +0200
@@ -2005,14 +2005,16 @@
        }
 
        if (sk_has_memory_pressure(sk)) {
-               int alloc;
+               int alloc, atomic_ret;
 
                if (!sk_under_memory_pressure(sk))
                        return 1;
                alloc = sk_sockets_allocated_read_positive(sk);
+               atomic_ret = atomic_read(&sk->sk_rmem_alloc);
+               printk(KERN_ERR "PAX: sk_wmem_queued: %x sk_forward_alloc: %x atomic_ret: %x\n", sk->sk_wmem_queued, atomic_ret, sk->sk_forward_alloc);
                if (sk_prot_mem_limits(sk, 2) > alloc *
                    sk_mem_pages(sk->sk_wmem_queued +
-                                atomic_read(&sk->sk_rmem_alloc) +
+                                atomic_ret +
                                 sk->sk_forward_alloc))
                        return 1;
        }
--- fs/exec.c.orig      2014-06-29 00:30:42.147969249 +0200
+++ fs/exec.c   2014-06-29 00:41:45.487978193 +0200
@@ -2089,6 +2089,7 @@
 {
        printk(KERN_ERR "PAX: size overflow detected in function %s %s:%u %s", func, file, line, ssa_name);
        dump_stack();
+       printk(KERN_ERR "PAX: size overflow detected in function %s %s:%u %s", func, file, line, ssa_name);
        do_group_exit(SIGKILL);
 }
 EXPORT_SYMBOL(report_size_overflow);


also can you reproduce it with the latest kernel version?

Re: Kernel Panic (report_size_overflow) in __sk_mem_schedule

PostPosted: Sun Jun 29, 2014 6:07 am
by giact
Thanks for your help!

Could you please send me the whole report_size_overflow message (It starts with "PAX: size overflow detected in function")


Unfortunately, there is never such message on the screen when the system locks up.
And after I reboot, journalctl shows nothing.

and the results of this patch


All right, thanks! I will patch and compile and see what happens.

BTW, are you trying to figure out if the overflow detection happens with atomic_read(&sk->sk_rmem_alloc) and/or if the sum of those three terms actually overflows?

also can you reproduce it with the latest kernel version?


Last week the same issue happened to me while using hardened-sources-3.14.6:
https://imgur.com/1wdf1N2

PS: It's actually not easy to reproduce because it happens at seemingly random times (about once every 2 or 3 days).

Re: Kernel Panic (report_size_overflow) in __sk_mem_schedule

PostPosted: Sun Jun 29, 2014 4:36 pm
by ephox
giact wrote:BTW, are you trying to figure out if the overflow detection happens with atomic_read(&sk->sk_rmem_alloc) and/or if the sum of those three terms actually overflows?

The overflow can only happen on the sum, not in atomic_read and that's why I need the values of each term.

Re: Kernel Panic (report_size_overflow) in __sk_mem_schedule

PostPosted: Mon Jun 30, 2014 4:08 am
by giact
ephox wrote:The overflow can only happen on the sum, not in atomic_read and that's why I need the values of each term.


Ah I see, thanks!

By the way: since I was unable to see the pax overflow message, won't it be very likely that I won't be able to see that custom printk we just inserted, either?
Do you think I should try and figure out how to set up a serial console for logging everything to a second machine?

PS: so far the kernel has not yet panicked again

Re: Kernel Panic (report_size_overflow) in __sk_mem_schedule

PostPosted: Tue Jul 01, 2014 9:45 am
by PaX Team
since you got the output of dump_stack(), chances are that you'll also get the following printk as well. even better would of course be if you could capture the kernel logs somehow, via netconsole or a serial console. i also looked at the code a bit and the overflow path seems to trigger under some kind of memory pressure, so maybe you could try to run something that eats up memory and see if it helps (if this is a production machine then better just wait it out though ;)).

Re: Kernel Panic (report_size_overflow) in __sk_mem_schedule

PostPosted: Tue Jul 01, 2014 6:02 pm
by giact
PaX Team wrote:since you got the output of dump_stack(), chances are that you'll also get the following printk as well. even better would of course be if you could capture the kernel logs somehow, via netconsole or a serial console.


I will setup a serial console soon (this machine has no serial port, I ordered a pcie serial card since it's always useful).
I cannot use netconsole, cause my eth0 is slave in a bonded interface (together with wlan0), and I don't want to change my network configuration cause it might be part of what causes the issue (eth0 is connected to an ethernet-over-power so the link can be intermittent every now and then).

PaX Team wrote:i also looked at the code a bit and the overflow path seems to trigger under some kind of memory pressure, so maybe you could try to run something that eats up memory and see if it helps (if this is a production machine then better just wait it out though ;)).


Tonight I did try messing up with memory usage using the "stress" tool, together with heavy network usage, but it has not happened again yet :(
BTW, it's not a production machine, it's just my work station, so I can run any sort of crazy tests during off-work hours.

Re: Kernel Panic (report_size_overflow) in __sk_mem_schedule

PostPosted: Fri Jul 04, 2014 2:17 pm
by giact
I have news.

Today it finally crashed again.

Unfortunately, it crashed while I was afk, and when I came back the screen was in standby mode (as if it went into power saving) and for some unknown reason I could not bring it back on.
Fortunately, I was also sending printk messages to the serial port:

This is all it showed:
Code: Select all
PAX: sk_wmem_queued: 7d155800 sk_forward_alloc: 0 atomic_ret: 1800
PAX: sk_wmem_queued: 7d156a00 sk_forward_alloc: 0 atomic_ret: 1600
PAX: sk_wmem_queued: 7d157c00 sk_forward_alloc: 0 atomic_ret: 1400
PAX: sk_wmem_queued: 7d158e00 sk_forward_alloc: 0 atomic_ret: 1200
PAX: sk_wmem_queued: 7d15a000 sk_forward_alloc: 0 atomic_ret: 1000
PAX: sk_wmem_queued: 7d15a900 sk_forward_alloc: 0 atomic_ret: 1700
PAX: sk_wmem_queued: 7d15bb00 sk_forward_alloc: 0 atomic_ret: 1500
PAX: sk_wmem_queued: 7d15cd00 sk_forward_alloc: 0 atomic_ret: 1300
PAX: sk_wmem_queued: 7d15df00 sk_forward_alloc: 0 atomic_ret: 1100
PAX: sk_wmem_queued: 7d15e800 sk_forward_alloc: 0 atomic_ret: 1800
PAX: sk_wmem_queued: 7d15fa00 sk_forward_alloc: 0 atomic_ret: 1600
PAX: sk_wmem_queued: 7d160c00 sk_forward_alloc: 0 atomic_ret: 1400
PAX: sk_wmem_queued:


Unfortunately, there is no stack trace.

PS: given that huge number, I guess it really is overflowing past the signed int limit.

Re: Kernel Panic (report_size_overflow) in __sk_mem_schedule

PostPosted: Fri Jul 04, 2014 7:01 pm
by PaX Team
yeah, this looks like a real bug, not a false positive, sk_wmem_queued gets incremented by 0x1200 indefinitely for some reason. since we're not familiar with this network code at all, the best way forward would be if you could let the upstream maintainers know about this and they can hopefully help you debug it to its root cause.

Re: Kernel Panic (report_size_overflow) in __sk_mem_schedule

PostPosted: Sat Jul 05, 2014 5:35 am
by giact
All right, I will keep investigating then.
Thanks a lot for your help!