Enable and disable RBAC system very fast causes system crash
Posted: Mon Aug 12, 2013 10:47 am
Hi there,
I am a software engineer from Wind River (subsidiary of Intel), recently, we ported GRsecurity patch (GRSecurity 2.9.1 -- 201207080925) into Wind River Linux as our security solution's critical part, the target board is intel-atom based or Freescale imx6 based platform (2 core/4 core). However, when run endurance test like:
$ while true; do
gradm -E;
echo <password>|gradm -D;
done
We frequently (basically in less than 5 minutes) got system crash, and calltrace looks like:
[ 164.729304] BUG: unable to handle kernel NULL pointer dereference at 00000190
[ 164.729388] IP: [<0031a6b0>] __full_lookup+0x30/0xd0
[ 164.729447] *pdpt = 000000003501b001 *pde = 0000000000000000
[ 164.729495] Oops: 0000 [#1] PREEMPT SMP
[ 164.729538] LTT NESTING LEVEL : 0
[ 164.729567] Modules linked in: can_dev pch_can cfg80211 iwlwifi mac80211 cdc_acm usbnet cdc_ncm arc4 ftdi_sio minix llc stp bridge x_tables ip_tables iptable_filter ip6_tables ip6table_filter nf_nat iptable_nat iptable_raw ip6table_raw xt_conntrack ipt_REJECT ip6t_REJECT iptable_mangle ip6table_mangle ipt_MASQUERADE xt_tcpudp xt_limit xt_TCPMSS
[ 164.729880]
[ 164.729911] Pid: 4593, comm: multiwan Not tainted 3.4.43-grsec-WR5.0.1.0_standard #6 N/A N/A/nanoETXexpress-TT
[ 164.729973] EIP: 0060:[<0031a6b0>] EFLAGS: 00010217 CPU: 1
[ 164.730009] EIP is at __full_lookup+0x30/0xd0
[ 164.730038] EAX: c2051024 EBX: 00000000 ECX: 00000000 EDX: 00000002
[ 164.730069] ESI: 00800001 EDI: 000068df EBP: f5071be8 ESP: f5071bd0
[ 164.730101] DS: 0068 ES: 0068 FS: 00d8 GS: 0033 SS: 0068
[ 164.730131] CR0: 80050033 CR2: 00000190 CR3: 01c05020 CR4: 000007f0
[ 164.730159] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 164.730185] DR6: ffff0ff0 DR7: 00000400
[ 164.730214] Process multiwan (pid: 4593, ti=f5fcb688 task=f5fcb300 task.ti=f5fcb688)
[ 164.730239] Stack:
[ 164.730258] f6cdb6d0 f6cdb6d0 c243c880 c243c780 c243c880 00800001 f5071c30 0031a85d
[ 164.730344] 00800001 00000000 f5071c38 00000000 c243c8cc 017472db f6cdb6c0 f6cdb6d0
[ 164.730420] c243c880 00000000 000068df 00000000 f6cdb6d0 00000000 c243c880 f5fcb300
[ 164.730493] Call Trace:
[ 164.730534] [<00800001>] ? 0x800000
[ 164.730568] [<0031a85d>] __chk_obj_label+0x10d/0x3a0
[ 164.730600] [<00800001>] ? 0x800000
[ 164.730635] [<000068df>] ? show_trace+0x1f/0x30
[ 164.730669] [<0031d4e0>] gr_search_file+0x50/0x160
[ 164.730705] [<00404010>] ? i915_read_indexed+0x40/0x40
[ 164.730739] [<00321138>] gr_acl_handle_open+0x48/0x170
[ 166.407383] [<0012556d>] do_last.isra.30+0x2cd/0x8e0
[ 166.407470] [<00125c1f>] path_openat+0x9f/0x3c0
[ 166.407553] [<00126142>] do_filp_open+0x32/0x80
[ 166.407872] [<0011588a>] do_sys_open+0xfa/0x2b0
The system keeps panic, and this stability is un-acceptable, so we have to debug in it. Anyway, above is an easy-to-locate NULL pointer dereferenc issue, I found:
1806 static struct acl_object_label *
1807 __full_lookup(const struct dentry *orig_dentry, const struct vfsmount *orig_mnt,
1808 const ino_t curr_ino, const dev_t curr_dev,
1809 const struct acl_subject_label *subj, char **path, const int checkglob)
1810 {
1811 struct acl_subject_label *tmpsubj;
1812 struct acl_object_label *retval;
1813 struct acl_object_label *retval2;
1814
1815 tmpsubj = (struct acl_subject_label *) subj;
Crash is caused by in above line 1815, subj equals to NULL when its passed in parameters. So will crash when:
1817 do {
1818 retval = lookup_acl_obj_label(curr_ino, curr_dev, tmpsubj);
Because that leads to NULL pointer deref, simply fix will be avoiding going ahead when check (subj == NULL).
Unfortunately, we found such strategy cannot save us, because there exists too many places in the whole GRsecurity's 86k+ lines patch accessing:
@@ -1594,6 +1621,27 @@ struct task_struct {
unsigned long default_timer_slack_ns;
+ struct acl_subject_label *acl;
+ struct acl_role_label *role;
via form of 'task->acl', or 'task->role', or 'current->acl', or 'current->role'... Only small part of the code will check fetched 'acl' or 'role' is NULL or not before using them to dereference, fixing all of them has become a nightmare we cannot handle!
Can anyone in this forum help? Many thanks in advance!
I am a software engineer from Wind River (subsidiary of Intel), recently, we ported GRsecurity patch (GRSecurity 2.9.1 -- 201207080925) into Wind River Linux as our security solution's critical part, the target board is intel-atom based or Freescale imx6 based platform (2 core/4 core). However, when run endurance test like:
$ while true; do
gradm -E;
echo <password>|gradm -D;
done
We frequently (basically in less than 5 minutes) got system crash, and calltrace looks like:
[ 164.729304] BUG: unable to handle kernel NULL pointer dereference at 00000190
[ 164.729388] IP: [<0031a6b0>] __full_lookup+0x30/0xd0
[ 164.729447] *pdpt = 000000003501b001 *pde = 0000000000000000
[ 164.729495] Oops: 0000 [#1] PREEMPT SMP
[ 164.729538] LTT NESTING LEVEL : 0
[ 164.729567] Modules linked in: can_dev pch_can cfg80211 iwlwifi mac80211 cdc_acm usbnet cdc_ncm arc4 ftdi_sio minix llc stp bridge x_tables ip_tables iptable_filter ip6_tables ip6table_filter nf_nat iptable_nat iptable_raw ip6table_raw xt_conntrack ipt_REJECT ip6t_REJECT iptable_mangle ip6table_mangle ipt_MASQUERADE xt_tcpudp xt_limit xt_TCPMSS
[ 164.729880]
[ 164.729911] Pid: 4593, comm: multiwan Not tainted 3.4.43-grsec-WR5.0.1.0_standard #6 N/A N/A/nanoETXexpress-TT
[ 164.729973] EIP: 0060:[<0031a6b0>] EFLAGS: 00010217 CPU: 1
[ 164.730009] EIP is at __full_lookup+0x30/0xd0
[ 164.730038] EAX: c2051024 EBX: 00000000 ECX: 00000000 EDX: 00000002
[ 164.730069] ESI: 00800001 EDI: 000068df EBP: f5071be8 ESP: f5071bd0
[ 164.730101] DS: 0068 ES: 0068 FS: 00d8 GS: 0033 SS: 0068
[ 164.730131] CR0: 80050033 CR2: 00000190 CR3: 01c05020 CR4: 000007f0
[ 164.730159] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 164.730185] DR6: ffff0ff0 DR7: 00000400
[ 164.730214] Process multiwan (pid: 4593, ti=f5fcb688 task=f5fcb300 task.ti=f5fcb688)
[ 164.730239] Stack:
[ 164.730258] f6cdb6d0 f6cdb6d0 c243c880 c243c780 c243c880 00800001 f5071c30 0031a85d
[ 164.730344] 00800001 00000000 f5071c38 00000000 c243c8cc 017472db f6cdb6c0 f6cdb6d0
[ 164.730420] c243c880 00000000 000068df 00000000 f6cdb6d0 00000000 c243c880 f5fcb300
[ 164.730493] Call Trace:
[ 164.730534] [<00800001>] ? 0x800000
[ 164.730568] [<0031a85d>] __chk_obj_label+0x10d/0x3a0
[ 164.730600] [<00800001>] ? 0x800000
[ 164.730635] [<000068df>] ? show_trace+0x1f/0x30
[ 164.730669] [<0031d4e0>] gr_search_file+0x50/0x160
[ 164.730705] [<00404010>] ? i915_read_indexed+0x40/0x40
[ 164.730739] [<00321138>] gr_acl_handle_open+0x48/0x170
[ 166.407383] [<0012556d>] do_last.isra.30+0x2cd/0x8e0
[ 166.407470] [<00125c1f>] path_openat+0x9f/0x3c0
[ 166.407553] [<00126142>] do_filp_open+0x32/0x80
[ 166.407872] [<0011588a>] do_sys_open+0xfa/0x2b0
The system keeps panic, and this stability is un-acceptable, so we have to debug in it. Anyway, above is an easy-to-locate NULL pointer dereferenc issue, I found:
1806 static struct acl_object_label *
1807 __full_lookup(const struct dentry *orig_dentry, const struct vfsmount *orig_mnt,
1808 const ino_t curr_ino, const dev_t curr_dev,
1809 const struct acl_subject_label *subj, char **path, const int checkglob)
1810 {
1811 struct acl_subject_label *tmpsubj;
1812 struct acl_object_label *retval;
1813 struct acl_object_label *retval2;
1814
1815 tmpsubj = (struct acl_subject_label *) subj;
Crash is caused by in above line 1815, subj equals to NULL when its passed in parameters. So will crash when:
1817 do {
1818 retval = lookup_acl_obj_label(curr_ino, curr_dev, tmpsubj);
Because that leads to NULL pointer deref, simply fix will be avoiding going ahead when check (subj == NULL).
Unfortunately, we found such strategy cannot save us, because there exists too many places in the whole GRsecurity's 86k+ lines patch accessing:
@@ -1594,6 +1621,27 @@ struct task_struct {
unsigned long default_timer_slack_ns;
+ struct acl_subject_label *acl;
+ struct acl_role_label *role;
via form of 'task->acl', or 'task->role', or 'current->acl', or 'current->role'... Only small part of the code will check fetched 'acl' or 'role' is NULL or not before using them to dereference, fixing all of them has become a nightmare we cannot handle!
Can anyone in this forum help? Many thanks in advance!