The following deadlock was observed on all eight cores on a busy shell server running v2.6.32.32 with grsecurity patches and the BFS scheduler.
It was hinted this might be grsec related. Any ideas?
- Code: Select all
BUG: soft lockup - CPU#0 stuck for 61s! [irssi:32373]
CPU 0:
Pid: 32373, comm: irssi Tainted: G D 2.6.32.32-grsec-bfsha #1 ProLiant DL380 G5
RIP: 0010:[<ffffffff816d487a>] [<ffffffff816d487a>] _spin_lock+0xa/0x20
RSP: 0018:ffff8801b5611c70 EFLAGS: 00000293
RAX: 0000000000007974 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff88041a92fa23 RDI: ffffffff8197d440
RBP: ffffffff81002f9e R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffff01
R13: ffffffff810d7f50 R14: ffff8801b5611b88 R15: ffff880100000000
FS: 00007f7285d176e0(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000f7794000 CR3: 00000001b76ed000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff810df760>] ? lookup_mnt+0x10/0x50
[<ffffffff810cdbe0>] ? __follow_mount+0x80/0xa0
[<ffffffff810cde23>] ? do_lookup+0x143/0x270
[<ffffffff810cffde>] ? __link_path_walk+0x1ce/0x1230
[<ffffffff8102f627>] ? __wake_up_common+0x47/0x80
[<ffffffff810d131d>] ? path_walk+0x7d/0x100
[<ffffffff810d14eb>] ? do_path_lookup+0x5b/0x60
[<ffffffff810d2467>] ? user_path_at+0x97/0xd0
[<ffffffff810c7dda>] ? cp_new_stat+0x16a/0x180
[<ffffffff810c7be7>] ? vfs_fstatat+0x37/0x80
[<ffffffff81058569>] ? ktime_get_ts+0x69/0xd0
[<ffffffff810c7eaf>] ? sys_newstat+0x1f/0x50
[<ffffffff810d6b33>] ? sys_poll+0x73/0x110
[<ffffffff81003285>] ? device_not_available+0x15/0x20
[<ffffffff8100259b>] ? system_call_fastpath+0x16/0x1b
BUG: soft lockup - CPU#1 stuck for 61s! [ninja:8891]
CPU 1:
Pid: 8891, comm: ninja Tainted: G D 2.6.32.32-grsec-bfsha #1 ProLiant DL380 G5
RIP: 0010:[<ffffffff816d4880>] [<ffffffff816d4880>] _spin_lock+0x10/0x20
RSP: 0018:ffff880413a77c80 EFLAGS: 00000297
RAX: 0000000000007574 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff88041dc21c64 RDI: ffffffff8197d440
RBP: ffffffff81002f9e R08: 0000000000000000 R09: ffff880075faad80
R10: 0000000000404633 R11: 0000000000000246 R12: ffffffff810e2d67
R13: ffff8800409012c0 R14: ffff880413a77d68 R15: ffff880413a77d78
FS: 00007fa7fed456e0(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fa7fed5a000 CR3: 000000032dd23000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff810df760>] ? lookup_mnt+0x10/0x50
[<ffffffff810cdbe0>] ? __follow_mount+0x80/0xa0
[<ffffffff810cde23>] ? do_lookup+0x143/0x270
[<ffffffff8109cab1>] ? handle_mm_fault+0x611/0x910
[<ffffffff810cffde>] ? __link_path_walk+0x1ce/0x1230
[<ffffffff810b796d>] ? get_partial_node+0x1d/0x90
[<ffffffff810ba86c>] ? __slab_alloc+0x9c/0x490
[<ffffffff810d131d>] ? path_walk+0x7d/0x100
[<ffffffff810d14eb>] ? do_path_lookup+0x5b/0x60
[<ffffffff810d27ab>] ? do_filp_open+0x10b/0xc80
[<ffffffff810de2b9>] ? expand_files+0x49/0x260
[<ffffffff810de51a>] ? alloc_fd+0x4a/0x140
[<ffffffff810c159d>] ? do_sys_open+0x9d/0x160
[<ffffffff8100259b>] ? system_call_fastpath+0x16/0x1b
BUG: soft lockup - CPU#2 stuck for 61s! [udevd:1801]
CPU 2:
Pid: 1801, comm: udevd Tainted: G D 2.6.32.32-grsec-bfsha #1 ProLiant DL380 G5
RIP: 0010:[<ffffffff816d487a>] [<ffffffff816d487a>] _spin_lock+0xa/0x20
RSP: 0018:ffff88041bea9cf0 EFLAGS: 00000293
RAX: 0000000000007674 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff88041dc20ca3 RDI: ffffffff8197d440
RBP: ffffffff81002f9e R08: 0000000000000000 R09: 00007fffc5cea155
R10: 0000000000000072 R11: 0000000000000297 R12: ffff88041e2f9488
R13: ffff88041e2f9478 R14: ffffffff81058569 R15: 000000021e2f9468
FS: 00007f9fe61aa770(0000) GS:ffff880028280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000040d44c CR3: 000000041c7e2000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff810df760>] ? lookup_mnt+0x10/0x50
[<ffffffff810cdbe0>] ? __follow_mount+0x80/0xa0
[<ffffffff810cde23>] ? do_lookup+0x143/0x270
[<ffffffff8108383e>] ? __lock_page+0x5e/0x70
[<ffffffff810cffde>] ? __link_path_walk+0x1ce/0x1230
[<ffffffff810d131d>] ? path_walk+0x7d/0x100
[<ffffffff810d14eb>] ? do_path_lookup+0x5b/0x60
[<ffffffff810d15c3>] ? user_path_parent+0x73/0xa0
[<ffffffff810d1937>] ? do_unlinkat+0x47/0x290
[<ffffffff81022ef2>] ? do_page_fault+0x162/0x390
[<ffffffff8100259b>] ? system_call_fastpath+0x16/0x1b
BUG: soft lockup - CPU#3 stuck for 61s! [fail2ban-server:10251]
CPU 3:
Pid: 10251, comm: fail2ban-server Tainted: G D 2.6.32.32-grsec-bfsha #1 ProLiant DL380 G5
RIP: 0010:[<ffffffff816d487e>] [<ffffffff816d487e>] _spin_lock+0xe/0x20
RSP: 0018:ffff8802113c9c70 EFLAGS: 00000293
RAX: 0000000000007c74 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff88041a92fa23 RDI: ffffffff8197d440
RBP: ffffffff81002f9e R08: 0000000000000000 R09: 0000000000000000
R10: 00007f7854db2a50 R11: 0000000000000246 R12: ffff8802113c9be0
R13: ffffffff810500c0 R14: ffff8800039abe80 R15: 0000000000000000
FS: 0000000042495950(0063) GS:ffff8800282c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000018 CR3: 0000000412e7a000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff810df760>] ? lookup_mnt+0x10/0x50
[<ffffffff810cdbe0>] ? __follow_mount+0x80/0xa0
[<ffffffff810cde23>] ? do_lookup+0x143/0x270
[<ffffffff810db739>] ? touch_atime+0x79/0x180
[<ffffffff810cffde>] ? __link_path_walk+0x1ce/0x1230
[<ffffffff810d131d>] ? path_walk+0x7d/0x100
[<ffffffff810d14eb>] ? do_path_lookup+0x5b/0x60
[<ffffffff810d2467>] ? user_path_at+0x97/0xd0
[<ffffffff810ab3ad>] ? free_pages_and_swap_cache+0x9d/0xc0
[<ffffffff81058569>] ? ktime_get_ts+0x69/0xd0
[<ffffffff810c7be7>] ? vfs_fstatat+0x37/0x80
[<ffffffff810c7eaf>] ? sys_newstat+0x1f/0x50
[<ffffffff810d7cf3>] ? sys_select+0x63/0x190
[<ffffffff8100259b>] ? system_call_fastpath+0x16/0x1b
BUG: soft lockup - CPU#4 stuck for 61s! [irssi:15169]
CPU 4:
Pid: 15169, comm: irssi Tainted: G D 2.6.32.32-grsec-bfsha #1 ProLiant DL380 G5
RIP: 0010:[<ffffffff816d4880>] [<ffffffff816d4880>] _spin_lock+0x10/0x20
RSP: 0018:ffff880250c79c70 EFLAGS: 00000297
RAX: 0000000000007874 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff88041a92fa23 RDI: ffffffff8197d440
RBP: ffffffff81002f9e R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffff01
R13: ffffffff810d7f50 R14: ffff880250c79b88 R15: ffff880200000000
FS: 00007f683d4e46e0(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000404112 CR3: 0000000322068000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff810df760>] ? lookup_mnt+0x10/0x50
[<ffffffff810cdbe0>] ? __follow_mount+0x80/0xa0
[<ffffffff810cde23>] ? do_lookup+0x143/0x270
[<ffffffff810cffde>] ? __link_path_walk+0x1ce/0x1230
[<ffffffff8102f627>] ? __wake_up_common+0x47/0x80
[<ffffffff810d131d>] ? path_walk+0x7d/0x100
[<ffffffff810d14eb>] ? do_path_lookup+0x5b/0x60
[<ffffffff810d2467>] ? user_path_at+0x97/0xd0
[<ffffffff81376f2b>] ? tty_write+0x37b/0x3a0
[<ffffffff810c7be7>] ? vfs_fstatat+0x37/0x80
[<ffffffff81058569>] ? ktime_get_ts+0x69/0xd0
[<ffffffff810c7eaf>] ? sys_newstat+0x1f/0x50
[<ffffffff810d6b33>] ? sys_poll+0x73/0x110
[<ffffffff81003285>] ? device_not_available+0x15/0x20
[<ffffffff8100259b>] ? system_call_fastpath+0x16/0x1b
BUG: soft lockup - CPU#5 stuck for 61s! [rpc.idmapd:5195]
CPU 5:
Pid: 5195, comm: rpc.idmapd Tainted: G D 2.6.32.32-grsec-bfsha #1 ProLiant DL380 G5
RIP: 0010:[<ffffffff816d4880>] [<ffffffff816d4880>] _spin_lock+0x10/0x20
RSP: 0018:ffff880413bcbc80 EFLAGS: 00000293
RAX: 0000000000007774 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff88041a92fa23 RDI: ffffffff8197d440
RBP: ffffffff81002f9e R08: 0000000000000000 R09: ffff88029b61a600
R10: 000000004da8327f R11: 0000000000000246 R12: ffff880413bcbfd8
R13: ffff880413bcbfd8 R14: ffff88041e20ea40 R15: 0000000000006d80
FS: 00007f7c2b28b6e0(0000) GS:ffff880028340000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f0188e0c000 CR3: 000000041b058000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff810df760>] ? lookup_mnt+0x10/0x50
[<ffffffff810cdbe0>] ? __follow_mount+0x80/0xa0
[<ffffffff810cde23>] ? do_lookup+0x143/0x270
[<ffffffff810cffde>] ? __link_path_walk+0x1ce/0x1230
[<ffffffff810500f0>] ? wake_bit_function+0x0/0x30
[<ffffffff810d131d>] ? path_walk+0x7d/0x100
[<ffffffff810d14eb>] ? do_path_lookup+0x5b/0x60
[<ffffffff810d27ab>] ? do_filp_open+0x10b/0xc80
[<ffffffff810de2b9>] ? expand_files+0x49/0x260
[<ffffffff810de51a>] ? alloc_fd+0x4a/0x140
[<ffffffff810c159d>] ? do_sys_open+0x9d/0x160
[<ffffffff8100259b>] ? system_call_fastpath+0x16/0x1b
BUG: soft lockup - CPU#6 stuck for 61s! [irssi:21556]
CPU 6:
Pid: 21556, comm: irssi Tainted: G D 2.6.32.32-grsec-bfsha #1 ProLiant DL380 G5
RIP: 0010:[<ffffffff816d4880>] [<ffffffff816d4880>] _spin_lock+0x10/0x20
RSP: 0018:ffff880102d8bc70 EFLAGS: 00000297
RAX: 0000000000007a74 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff88041a92fa23 RDI: ffffffff8197d440
RBP: ffffffff81002f9e R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffff01
R13: ffffffff810d7f50 R14: ffff880102d8bb88 R15: ffff880000000000
FS: 00007f549bcfd6e0(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4e90b09000 CR3: 0000000102c7d000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff810df760>] ? lookup_mnt+0x10/0x50
[<ffffffff810cdbe0>] ? __follow_mount+0x80/0xa0
[<ffffffff810cde23>] ? do_lookup+0x143/0x270
[<ffffffff810cffde>] ? __link_path_walk+0x1ce/0x1230
[<ffffffff810d7f50>] ? pollwake+0x0/0x60
[<ffffffff810d131d>] ? path_walk+0x7d/0x100
[<ffffffff810d14eb>] ? do_path_lookup+0x5b/0x60
[<ffffffff810d2467>] ? user_path_at+0x97/0xd0
[<ffffffff810500c0>] ? autoremove_wake_function+0x0/0x30
[<ffffffff810c7dda>] ? cp_new_stat+0x16a/0x180
[<ffffffff810c7be7>] ? vfs_fstatat+0x37/0x80
[<ffffffff81058569>] ? ktime_get_ts+0x69/0xd0
[<ffffffff810c7eaf>] ? sys_newstat+0x1f/0x50
[<ffffffff810d6b33>] ? sys_poll+0x73/0x110
[<ffffffff81003285>] ? device_not_available+0x15/0x20
[<ffffffff8100259b>] ? system_call_fastpath+0x16/0x1b
BUG: soft lockup - CPU#7 stuck for 61s! [links:15071]
CPU 7:
Pid: 15071, comm: links Tainted: G D 2.6.32.32-grsec-bfsha #1 ProLiant DL380 G5
RIP: 0010:[<ffffffff816d4880>] [<ffffffff816d4880>] _spin_lock+0x10/0x20
RSP: 0018:ffff88003e05bc80 EFLAGS: 00000297
RAX: 0000000000007b74 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff88041a9b6ca4 RDI: ffffffff8197d440
RBP: ffffffff81002f9e R08: 0000000000000000 R09: 0000000000000000
R10: 00000000004de515 R11: 0000000000000246 R12: 0000000000000000
R13: ffff88029aa660a8 R14: ffffffff8131f2a4 R15: 0000000000000010
FS: 00007f6edf0346e0(0000) GS:ffff8800283c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000a96000 CR3: 000000013d6a8000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff810df760>] ? lookup_mnt+0x10/0x50
[<ffffffff810cdbe0>] ? __follow_mount+0x80/0xa0
[<ffffffff810cde23>] ? do_lookup+0x143/0x270
[<ffffffff810cffde>] ? __link_path_walk+0x1ce/0x1230
[<ffffffff810d131d>] ? path_walk+0x7d/0x100
[<ffffffff810d14eb>] ? do_path_lookup+0x5b/0x60
[<ffffffff810d27ab>] ? do_filp_open+0x10b/0xc80
[<ffffffff816d28d1>] ? thread_return+0x61/0x430
[<ffffffff810d909e>] ? dput+0xae/0x160
[<ffffffff810de2b9>] ? expand_files+0x49/0x260
[<ffffffff810de51a>] ? alloc_fd+0x4a/0x140
[<ffffffff810c159d>] ? do_sys_open+0x9d/0x160
[<ffffffff8100259b>] ? system_call_fastpath+0x16/0x1b
Perhaps the RAX registers give some hint as to which core got stuck first:
- Code: Select all
cpu0:RAX: 0000000000007974
cpu1:RAX: 0000000000007574
cpu2:RAX: 0000000000007674
cpu3:RAX: 0000000000007c74
cpu4:RAX: 0000000000007874
cpu5:RAX: 0000000000007774
cpu6:RAX: 0000000000007a74
cpu7:RAX: 0000000000007b74
Any help would be much appreciated.
Edit: Added stack traces from other cores