We have been chasing a problem bootup problem on our older AMD Thunderbird systems (Asus/A7VC) that started with 4.1.7 and continues onward to 4.3.3.
We are using Gentoo hardened sources, however what I am reporting was tested with Vanilla 4.3.3 + grsecurity-3.1-4.3.3-201512282134.patch (only).
Quick synopsis:
Works: hardened-sources-3.18.9
Works: hardened-sources-4.0.8
Fails: hardened-sources-4.1.7-r1 (It fails due to some sort of ACPI regression so may not be grsecurity related)
Works: vanilla-sources-4.1.7 (!grsecurity)
Fails: hardened-sources-4.3.3-r3 (4.3.3+grsecurity crashes consistently in same place)
Works: vanilla-sources-4.3.3 (!grsecurity)
Fails: vanilla-sources-4.3.3 + grsecurity-3.1-4.3.3-201512282134.patch (Same crash result as hardened-sources-4.3.3-r3)
I also tested with compiling the kernel configured for UNP instead of SMP, same crash results.
I have put up at: https://secure.Futurequest.net/4grsecurity/
1) serial console capture logs (from all the various tests). The Vanilla+grsecurity ones are named: 4.3.3-6-VANILLA+GRSEC*
2) kernel .config file
3) tarball from (make targz-pkg)
4) the tweaked namespace.c and x86/.../atomic64_32.h using printk() for isolation purposes
5) file from: make fs/namespace.i (and also fs/namespace.o) **For versions 4.0.8 and 4.3.3
One thing I did notice while looking at namespace.i, is I wanted to see if there were changes between 4.0.8 and 4.3.3 in regards to: atomic64_add_return_unchecked()
Seems there were a few changes between 4.0.8 and 4.3.3 in: x86/.../alternative.h
static inline __attribute__((always_inline)) __attribute__((no_instrument_function)) long long atomic64_add_return_unchecked(long long i, atomic64_unchecked_t *v)
4.0.8 asm volatile ("661:\n\t" "call %P[old]" "\n662:\n" ".pushsection .altinstructions,\"a\"\n" " .long 661b - .\n" " .long " "663""1""f - .\n" " .word " "( 0*32+ 8)" "\n" " .byte " "662b-661b" "\n" " .byte " "664""1""f-""663""1""f" "\n" ".popsection\n" ".pushsection .discard,\"aw\",@progbits\n" " .byte 0xff + (" "664""1""f-""663""1""f" ") - (" "662b-661b" ")\n" ".popsection\n" ".pushsection .altinstr_replacement, \"a\"\n" "663""1"":\n\t" "call %P[new]" "\n" "664""1" ":\n\t" ".popsection" : "+A" (i), "+c" (v) : [old] "i" (atomic64_add_return_unchecked_386), [new] "i" (atomic64_add_return_unchecked_cx8), "i" (0) : "memory")
4.3.3 asm volatile ("661:\n\t" "call %P[old]" "\n662:\n" ".skip -(((" "665""1""f-""664""1""f" ")-(" "662b-661b" ")) > 0) * " "((" "665""1""f-""664""1""f" ")-(" "662b-661b" ")),0x90\n" "663" ":\n" ".pushsection .altinstructions,\"a\"\n" " .long 661b - .\n" " .long " "664""1""f - .\n" " .word " "( 0*32+ 8)" "\n" " .byte " "663""b-661b" "\n" " .byte " "665""1""f-""664""1""f" "\n" " .byte " "663""b-662b" "\n" ".popsection\n" ".pushsection .altinstr_replacement, \"a\"\n" "664""1"":\n\t" "call %P[new]" "\n" "665""1" ":\n\t" ".popsection" : "+A" (i), "+c" (v) : [old] "i" (atomic64_add_return_unchecked_386), [new] "i" (atomic64_add_return_unchecked_cx8), "i" (0) : "memory")
I could really use a second set of eyes on this which understands the assembly at this level.
What surprises me though is earlier in the boot process, alloc_mnt_ns() is called and runs properly, however on the later 2nd invocation, this is where it crashes consistently.
The kernel GPF crash (file: 4.3.3-6-VANILLA+GRSEC):
smpboot: Total of 1 processors activated (2668.57 BogoMIPS)
TT:alloc_mnt_ns: 2772
TT:alloc_mnt_ns: 2774
TT:alloc_mnt_ns: 2777
TT:alloc_mnt_ns: 2779
TT:alloc_mnt_ns: 2785
TT:alloc_mnt_ns: 2787
TT:alloc_mnt_ns: 2788, mnt_ns_seq: c2028ba0 : 2
TT:atomic64: 1
PAX: suspicious general protection fault: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 10 Comm: kdevtmpfs Not tainted 4.3.3-grsec #3
Hardware name: System Manufacturer System Name/A7VC , BIOS ASUS A7VC ACPI BIOS Revision 1003 12/20/2002
task: dec4ba40 ti: dec4bcb8 task.ti: dec4bcb8
EIP: 0060:[<0012dcc3>] EFLAGS: 00010246 CPU: 0
EAX: 00000001 EBX: dec06580 ECX: c2028ba0 EDX: 00000000
ESI: 00000000 EDI: 00020200 EBP: dec1fb60 ESP: dec73e04
DS: 0068 ES: 0068 FS: 00d8 GS: 0068 SS: 0068
CR0: 8005003b CR2: ffd0f000 CR3: 01c03000 CR4: 000002d0
Stack:
c1cea9bf 00000001 00000ae4 c2028ba0 00000002 00000000 c2043e00 df6039d4
8c34fca6 dec4e210 00130546 df228564 df6039d4 ffffffff 001077c9 dec06b80
dec06f84 c2024280 dec06ec4 df228564 000000d0 65c10460 8c34fca6 dec06ed4
Call Trace:
[<00130546>] ? copy_mnt_ns+0x86/0x770
[<001077c9>] ? cache_alloc_refill+0x1b9/0x4d0
[<00020200>] ? pt_event_init+0x40/0x190
[<0005daa3>] ? create_new_namespaces+0x53/0x140
[<00020200>] ? pt_event_init+0x40/0x190
[<0005dca1>] ? unshare_nsproxy_namespaces+0x51/0x90
[<00020200>] ? pt_event_init+0x40/0x190
[<00042c98>] ? SyS_unshare+0x178/0x2b0
[<003a32a0>] ? handle_remove+0x240/0x240
[<003a32cc>] ? devtmpfsd+0x2c/0x2e0
[<00020000>] ? topa_insert_table+0xb0/0x110
[<0007396a>] ? pick_next_task_fair+0x41a/0x4a0
[<00062de8>] ? ttwu_do_wakeup+0x18/0xf0
[<00002dc6>] ? nmi_print_seq+0x86/0x3d0
[<00353537>] ? acpi_enter_sleep_state_s4bios+0x37/0x70
[<0054e73c>] ? __schedule+0x21c/0xa30
[<00078a04>] ? __wake_up_common+0x44/0x70
[<00078a47>] ? __wake_up_locked+0x17/0x20
[<003a32a0>] ? handle_remove+0x240/0x240
[<0005d36c>] ? kthread+0x9c/0xc0
[<00552b4b>] ? ret_from_kernel_thread+0x1b/0x30
[<0005d2d0>] ? kthread_create_on_node+0x110/0x110
Code: 24 14 89 44 24 10 e8 71 fd f9 ff c7 44 24 04 01 00 00 00 c7 04 24 bf a9 ce c1 e8 5d fd f9 ff b8 01 00 00 00 31 d2 b9 a0 8b 02 c2 <e8> 18 c1 1c 3f c7 44 24 04 02 00 00 00 c7 04 24 bf a9 ce c1 89
EIP: [<0012dcc3>] alloc_mnt_ns.isra.28+0x173/0x2b0 SS:ESP 0068:dec73e04
---[ end trace 86bffebd5ec512c9 ]---
Kernel panic - not syncing: grsec: halting the system due to suspicious kernel crash caused by root
The scripts/decodecode output for the above is (debugging printk() version):
Code: 24 14 89 44 24 10 e8 71 fd f9 ff c7 44 24 04 01 00 00 00 c7 04 24 bf a9 ce c1 e8 5d fd f9 ff b8 01 00 00 00 31 d2 b9 a0 8b 02 c2 <e8> 18 c1 1c 3f c7 44 24 04 02 00 00 00 c7 04 24 bf a9 ce c1 89
All code
========
0: 24 14 and $0x14,%al
2: 89 44 24 10 mov %eax,0x10(%esp)
6: e8 71 fd f9 ff call 0xfff9fd7c
b: c7 44 24 04 01 00 00 movl $0x1,0x4(%esp)
12: 00
13: c7 04 24 bf a9 ce c1 movl $0xc1cea9bf,(%esp)
1a: e8 5d fd f9 ff call 0xfff9fd7c
1f: b8 01 00 00 00 mov $0x1,%eax
24: 31 d2 xor %edx,%edx
26: b9 a0 8b 02 c2 mov $0xc2028ba0,%ecx
2b:* e8 18 c1 1c 3f call 0x3f1cc148 <-- trapping instruction
30: c7 44 24 04 02 00 00 movl $0x2,0x4(%esp)
37: 00
38: c7 04 24 bf a9 ce c1 movl $0xc1cea9bf,(%esp)
3f: 89 .byte 0x89
Code starting with the faulting instruction
===========================================
0: e8 18 c1 1c 3f call 0x3f1cc11d
5: c7 44 24 04 02 00 00 movl $0x2,0x4(%esp)
c: 00
d: c7 04 24 bf a9 ce c1 movl $0xc1cea9bf,(%esp)
14: 89 .byte 0x89
I have also included a scripts/decodecode output for a kernel without the debugging printk() in: 4.3.3-6-VANILLA+GRSEC-CODE-WITHOUT-printk
Thank you for any assistance you can offer in helping us to fix this bootup GPF. Please let me know whatever else you need that will help with debugging this.
--
FutureQuest, Inc.
Terra