Enable and disable RBAC system very fast causes system crash

Discuss usability issues, general maintenance, and general support issues for a grsecurity-enabled system.

Enable and disable RBAC system very fast causes system crash

Postby edward.yangx » Mon Aug 12, 2013 10:47 am

Hi there,
I am a software engineer from Wind River (subsidiary of Intel), recently, we ported GRsecurity patch (GRSecurity 2.9.1 -- 201207080925) into Wind River Linux as our security solution's critical part, the target board is intel-atom based or Freescale imx6 based platform (2 core/4 core). However, when run endurance test like:

$ while true; do
gradm -E;
echo <password>|gradm -D;
done

We frequently (basically in less than 5 minutes) got system crash, and calltrace looks like:
[ 164.729304] BUG: unable to handle kernel NULL pointer dereference at 00000190
[ 164.729388] IP: [<0031a6b0>] __full_lookup+0x30/0xd0
[ 164.729447] *pdpt = 000000003501b001 *pde = 0000000000000000
[ 164.729495] Oops: 0000 [#1] PREEMPT SMP
[ 164.729538] LTT NESTING LEVEL : 0
[ 164.729567] Modules linked in: can_dev pch_can cfg80211 iwlwifi mac80211 cdc_acm usbnet cdc_ncm arc4 ftdi_sio minix llc stp bridge x_tables ip_tables iptable_filter ip6_tables ip6table_filter nf_nat iptable_nat iptable_raw ip6table_raw xt_conntrack ipt_REJECT ip6t_REJECT iptable_mangle ip6table_mangle ipt_MASQUERADE xt_tcpudp xt_limit xt_TCPMSS
[ 164.729880]
[ 164.729911] Pid: 4593, comm: multiwan Not tainted 3.4.43-grsec-WR5.0.1.0_standard #6 N/A N/A/nanoETXexpress-TT
[ 164.729973] EIP: 0060:[<0031a6b0>] EFLAGS: 00010217 CPU: 1
[ 164.730009] EIP is at __full_lookup+0x30/0xd0
[ 164.730038] EAX: c2051024 EBX: 00000000 ECX: 00000000 EDX: 00000002
[ 164.730069] ESI: 00800001 EDI: 000068df EBP: f5071be8 ESP: f5071bd0
[ 164.730101] DS: 0068 ES: 0068 FS: 00d8 GS: 0033 SS: 0068
[ 164.730131] CR0: 80050033 CR2: 00000190 CR3: 01c05020 CR4: 000007f0
[ 164.730159] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 164.730185] DR6: ffff0ff0 DR7: 00000400
[ 164.730214] Process multiwan (pid: 4593, ti=f5fcb688 task=f5fcb300 task.ti=f5fcb688)
[ 164.730239] Stack:
[ 164.730258] f6cdb6d0 f6cdb6d0 c243c880 c243c780 c243c880 00800001 f5071c30 0031a85d
[ 164.730344] 00800001 00000000 f5071c38 00000000 c243c8cc 017472db f6cdb6c0 f6cdb6d0
[ 164.730420] c243c880 00000000 000068df 00000000 f6cdb6d0 00000000 c243c880 f5fcb300
[ 164.730493] Call Trace:
[ 164.730534] [<00800001>] ? 0x800000
[ 164.730568] [<0031a85d>] __chk_obj_label+0x10d/0x3a0
[ 164.730600] [<00800001>] ? 0x800000
[ 164.730635] [<000068df>] ? show_trace+0x1f/0x30
[ 164.730669] [<0031d4e0>] gr_search_file+0x50/0x160
[ 164.730705] [<00404010>] ? i915_read_indexed+0x40/0x40
[ 164.730739] [<00321138>] gr_acl_handle_open+0x48/0x170
[ 166.407383] [<0012556d>] do_last.isra.30+0x2cd/0x8e0
[ 166.407470] [<00125c1f>] path_openat+0x9f/0x3c0
[ 166.407553] [<00126142>] do_filp_open+0x32/0x80
[ 166.407872] [<0011588a>] do_sys_open+0xfa/0x2b0

The system keeps panic, and this stability is un-acceptable, so we have to debug in it. Anyway, above is an easy-to-locate NULL pointer dereferenc issue, I found:
1806 static struct acl_object_label *
1807 __full_lookup(const struct dentry *orig_dentry, const struct vfsmount *orig_mnt,
1808 const ino_t curr_ino, const dev_t curr_dev,
1809 const struct acl_subject_label *subj, char **path, const int checkglob)
1810 {
1811 struct acl_subject_label *tmpsubj;
1812 struct acl_object_label *retval;
1813 struct acl_object_label *retval2;
1814
1815 tmpsubj = (struct acl_subject_label *) subj;

Crash is caused by in above line 1815, subj equals to NULL when its passed in parameters. So will crash when:
1817 do {
1818 retval = lookup_acl_obj_label(curr_ino, curr_dev, tmpsubj);

Because that leads to NULL pointer deref, simply fix will be avoiding going ahead when check (subj == NULL).

Unfortunately, we found such strategy cannot save us, because there exists too many places in the whole GRsecurity's 86k+ lines patch accessing:
@@ -1594,6 +1621,27 @@ struct task_struct {
unsigned long default_timer_slack_ns;
+ struct acl_subject_label *acl;
+ struct acl_role_label *role;

via form of 'task->acl', or 'task->role', or 'current->acl', or 'current->role'... Only small part of the code will check fetched 'acl' or 'role' is NULL or not before using them to dereference, fixing all of them has become a nightmare we cannot handle!

Can anyone in this forum help? Many thanks in advance!
edward.yangx
 
Posts: 7
Joined: Mon Aug 12, 2013 10:27 am

Re: Enable and disable RBAC system very fast causes system c

Postby spender » Mon Aug 12, 2013 8:04 pm

Hi,

This problem was resolved a long time ago (I note you're using a year+ old patch that we don't support). The changes in our patch that fixed this issue were:

Code: Select all
commit 1cd7ca006456c1a0760d34b75484b594593faea5
Author: Brad Spengler <spender@grsecurity.net>
Date:   Thu Jul 12 21:53:37 2012 -0400

    Force STOP_MACHINE on if grsecurity is enabled

 init/Kconfig     |    2 +-
 security/Kconfig |    1 +
 2 files changed, 2 insertions(+), 1 deletions(-)

commit cbcdd659a4d5a91f951cfccb34e346a221981888
Author: Brad Spengler <spender@grsecurity.net>
Date:   Thu Jul 12 21:14:51 2012 -0400

    Fix possible race on RBAC disable by ensuring all other
    CPUs are forced out of the kernel -- this is preferable to
    other locking methods that would impact performance of common paths
    for what should be a very infrequent operation that doesn't
    need to be fast
    Thanks to Mark Moseley for reporting and testing

 grsecurity/gracl.c |   33 +++++++++++++++++----------------
 1 files changed, 17 insertions(+), 16 deletions(-)

commit 5f733a29308f3298e0826db157309edfa990ee4b
Author: Brad Spengler <spender@grsecurity.net>
Date:   Thu Jul 12 20:46:54 2012 -0400

    Fix RBAC enable / special role exit race with fork
    Thanks to Mark Moseley for reporting and testing
   
    race looked like:
    >   cpu 1                               cpu 2
    >   fork begins
    >   fork calls dup_task_struct
    >                                       RBAC sets ->acl on all procs in tasklist
    >   fork completes, adds to tasklist
    >                                       RBAC sets enabled flag
    >   process exists with RBAC enabled
    >   and NULL ->acl

 grsecurity/gracl.c |   11 +++++------
 kernel/fork.c      |    5 +++--
 2 files changed, 8 insertions(+), 8 deletions(-)


A reminder as well that we do offer commercial support, given that you're deploying grsecurity in a commercial product.

Thanks,
-Brad
spender
 
Posts: 2185
Joined: Wed Feb 20, 2002 8:00 pm

Re: Enable and disable RBAC system very fast causes system c

Postby edward.yangx » Mon Aug 12, 2013 9:15 pm

Hi spender,
Thanks a lot for you comments, using comercial edition is an option, but before that we need to win our first potential customer with a PoC (proof of concept), this post shows how we progress with the Poc here :)

I noticed in other topics (viewtopic.php?f=3&t=3060
) you said that the problem has been fixed in latest release, however, when I looked into latest released patch at http://grsecurity.net/stable/grsecurity ... 2151.patch, the stop_machine() solution seems there, and I simply ported it into our old patch (sorry, that's the only available stable one tested with whole Wind River Linux, so we hesitate to upgrade significantly)

I ported below part:
+ case GR_SHUTDOWN:
+ if ((gr_status & GR_READY)
+ && !(chkpw(gr_usermode, gr_system_salt, gr_system_sum))) {
+ stop_machine(gr_rbac_disable, NULL, NULL);
+ free_variables();
+ memset(gr_usermode, 0, sizeof (struct gr_arg));
+ memset(gr_system_salt, 0, GR_SALT_LEN);
+ memset(gr_system_sum, 0, GR_SHA_LEN);
+ gr_log_noargs(GR_DONT_AUDIT_GOOD, GR_SHUTS_ACL_MSG);

+ case GR_RELOAD:
+ if (!(gr_status & GR_READY)) {
+ gr_log_str(GR_DONT_AUDIT_GOOD, GR_RELOADI_ACL_MSG, GR_VERSION);
+ error = -EAGAIN;
+ } else if (!(chkpw(gr_usermode, gr_system_salt, gr_system_sum))) {
+ stop_machine(gr_rbac_disable, NULL, NULL);
+ free_variables();
+ error2 = gracl_init(gr_usermode);
+ if (!error2)
+ gr_log_str(GR_DONT_AUDIT_GOOD, GR_RELOAD_ACL_MSG, GR_VERSION);
+ else {
+ gr_log_str(GR_DONT_AUDIT, GR_RELOADF_ACL_MSG, GR_VERSION);
+ error = error2;
+ }
+ } else {
+ gr_log_str(GR_DONT_AUDIT, GR_RELOADF_ACL_MSG, GR_VERSION);
+ error = -EPERM;
+ }
+ break;

and introduced gr_rbac_disable(), #include <linux/stop_machine.h>, Kconfig modification accordingly, but the same issue didn't go away, our board will panic still :(

I noticed the log you pasted seems have good solution than simply NULL pointer checking, we also tried to disable SMP mode in target by accessing /sys, because we're using 2-core and 4-core target, that seems make situation a little better, but the target crashed after 2 minutes still when it has only one CPU-core online :(

Am I porting wrong part of your solution? Is it possible that we access real code of you posted commits (cbcdd659a4d5a91f951cfccb34e346a221981888) before we officially migrate to comercial support purchase? Thanks!
edward.yangx
 
Posts: 7
Joined: Mon Aug 12, 2013 10:27 am

Re: Enable and disable RBAC system very fast causes system c

Postby spender » Mon Aug 12, 2013 10:24 pm

Hi Edward,

Assuming you ported the gracl.c part correctly, you're still missing the fix to fork.c, which I've split out here:
http://grsecurity.net/~spender/rbac_fork.diff

-Brad
spender
 
Posts: 2185
Joined: Wed Feb 20, 2002 8:00 pm

Re: Enable and disable RBAC system very fast causes system c

Postby edward.yangx » Mon Aug 12, 2013 10:34 pm

Hi spender,
To make things even more clear, I boot my target with parameter of 'maxcpus=1', the crash is still happening as before, I think this can prove we're not experiencing an SMP issue here.
I can see only one CPU is online via:
root@WR-IntelligentDevice:~# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 38
model name : Intel(R) Atom(TM) CPU E620 @ 600MHz
stepping : 1
microcode : 0x105
cpu MHz : 599.944
cache size : 512 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc up arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm tpr_shadow vnmi
bogomips : 1199.88
clflush size : 64

Then I run script of:
$ i=0; while :; do
gradm -E >/dev/null 2>&1;
echo windriver|gradm -D; echo "[ $(( ++i )) ]";
done

It ends up with crashing like:
Password:
196
Password:
197
[ 263.005485] BUG: unable to handle kernel NULL pointer dereference at 00000190
[ 263.005548] IP: [<0031aa80>] __full_lookup+0x30/0xd0
[ 263.005592] *pdpt = 0000000035e3e001 *pde = 0000000000000000
[ 263.005623] Oops: 0000 [#1] PREEMPT SMP
[ 263.005650] LTT NESTING LEVEL : 0
[ 263.005671] Modules linked in: cfg80211 mac80211 iwlwifi usbnet can_dev cdc_ncm pch_can ftdi_sio arc4 minix llc stp bridge x_tables ip_tables iptable_filter ip6_tables ip6table_filter nf_nat iptable_nat iptable_raw ip6table_raw xt_conntrack ipt_REJECT ip6t_REJECT iptable_mangle ip6table_m
edward.yangx
 
Posts: 7
Joined: Mon Aug 12, 2013 10:27 am

Re: Enable and disable RBAC system very fast causes system c

Postby edward.yangx » Tue Aug 13, 2013 1:24 am

I re-do the porting like below:
diff --git a/grsecurity/gracl.c b/grsecurity/gracl.c
index 31c0077..48ebc09 100644
--- a/grsecurity/gracl.c
+++ b/grsecurity/gracl.c
@@ -20,6 +20,7 @@
#include <linux/security.h>
#include <linux/grinternal.h>
#include <linux/pid_namespace.h>
+#include <linux/stop_machine.h>
#include <linux/fdtable.h>
#include <linux/percpu.h>
#include "../fs/mount.h"
@@ -2273,15 +2274,15 @@ gr_copy_label(struct task_struct *tsk)
/* plain copying of fields is already done by dup_task_struct */
tsk->signal->used_accept = 0;
tsk->acl_sp_role = 0;
- //tsk->acl_role_id = current->acl_role_id;
- //tsk->acl = current->acl;
- //tsk->role = current->role;
+ tsk->acl_role_id = current->acl_role_id;
+ tsk->acl = current->acl;
+ tsk->role = current->role;
tsk->signal->curr_ip = current->signal->curr_ip;
tsk->signal->saved_ip = current->signal->saved_ip;
if (current->exec_file)
get_file(current->exec_file);
- //tsk->exec_file = current->exec_file;
- //tsk->is_writable = current->is_writable;
+ tsk->exec_file = current->exec_file;
+ tsk->is_writable = current->is_writable;
if (unlikely(current->signal->used_accept)) {
current->signal->curr_ip = 0;
current->signal->saved_ip = 0;
@@ -3090,6 +3091,15 @@ int gr_check_secure_terminal(struct task_struct *task)
return 1;
}

+static int gr_rbac_disable(void *unused)
+{
+ pax_open_kernel();
+ gr_status &= ~GR_READY;
+ pax_close_kernel();
+
+ return 0;
+}
+
ssize_t
write_grsec_handler(struct file *file, const char * buf, size_t count, loff_t *ppos)
{
@@ -3174,15 +3184,12 @@ write_grsec_handler(struct file *file, const char * buf, size_t count, loff_t *p
case GR_SHUTDOWN:
if ((gr_status & GR_READY)
&& !(chkpw(gr_usermode, gr_system_salt, gr_system_sum))) {
- pax_open_kernel();
- gr_status &= ~GR_READY;
- pax_close_kernel();
-
- gr_log_noargs(GR_DONT_AUDIT_GOOD, GR_SHUTS_ACL_MSG);
+ stop_machine(gr_rbac_disable, NULL, NULL);
free_variables();
memset(gr_usermode, 0, sizeof (struct gr_arg));
memset(gr_system_salt, 0, GR_SALT_LEN);
memset(gr_system_sum, 0, GR_SHA_LEN);
+ gr_log_noargs(GR_DONT_AUDIT_GOOD, GR_SHUTS_ACL_MSG);
} else if (gr_status & GR_READY) {
gr_log_noargs(GR_DONT_AUDIT, GR_SHUTF_ACL_MSG);
error = -EPERM;
@@ -3208,12 +3215,9 @@ write_grsec_handler(struct file *file, const char * buf, size_t count, loff_t *p
error = -EAGAIN;
} else if (!(chkpw(gr_usermode, gr_system_salt, gr_system_sum))) {
preempt_disable();
-
- pax_open_kernel();
- gr_status &= ~GR_READY;
- pax_close_kernel();
-
+ stop_machine(gr_rbac_disable, NULL, NULL);
free_variables();
+
if (!(error2 = gracl_init(gr_usermode))) {
preempt_enable();
gr_log_str(GR_DONT_AUDIT_GOOD, GR_RELOAD_ACL_MSG, GR_VERSION);
diff --git a/kernel/fork.c b/kernel/fork.c
index 418d22c..df3ccff 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1381,8 +1381,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
if (clone_flags & CLONE_THREAD)
p->tgid = current->tgid;

- gr_copy_label(p);
-
p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL;
/*
* Clear TID on mm_release()?
@@ -1451,6 +1449,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
/* Need tasklist lock for parent etc handling! */
write_lock_irq(&tasklist_lock);

+ /* synchronizes with gr_set_acls() */
+ gr_copy_label(p);
+
/*
* The state of the parent's TIF_KTRACE flag may have changed
* since it was copied in dup_task_struct() so we re-copy it here.

and I added 'maxcpus=1' still at boot command line, the board crashed the same way in less than 200 iterations :(
edward.yangx
 
Posts: 7
Joined: Mon Aug 12, 2013 10:27 am

Re: Enable and disable RBAC system very fast causes system c

Postby spender » Tue Aug 13, 2013 7:12 am

The port looks mostly correct, here's the full interdiff: http://grsecurity.net/~spender/rbac_inter.diff

If it's not the enable/disable race, then it's a completely different problem (which I can't determine further without additional information like a full call trace, vmlinux, etc). I do know however that at the time of the fix, I ran the same exact test as you of looped enable/disable without any issue.

-Brad
spender
 
Posts: 2185
Joined: Wed Feb 20, 2002 8:00 pm

Re: Enable and disable RBAC system very fast causes system c

Postby edward.yangx » Tue Aug 13, 2013 10:28 am

Thanks for the whole patch, spender, that makes racing issue clear to all of us. Anyway, it doesn't work in our patch, maybe it's because we're using a too old patch, so the racing is a different one can be reproduced in single core.

We now added some (20~30) NULL pointer check logic in our old patch, the system can survive more than 1000 iterations loop with 1 second sleep between 'gradm -E' and 'gradm -D'. If no sleep at all, the system will crash still, but that becomes a little bit more acceptable.

Thank you for the patience and all the support! I'll save your patch well since apparently it brings value we don't have right now :)
edward.yangx
 
Posts: 7
Joined: Mon Aug 12, 2013 10:27 am

Re: Enable and disable RBAC system very fast causes system c

Postby edward.yangx » Wed Aug 14, 2013 10:14 pm

Hi spender,
Looking at the code again, how do you think if I fix my problem this way:
1) Integrate SMP racing solution you gave in your post
2) Remove 'task->acl = NULL' statements in free_variables() function as below:
diff --git a/grsecurity/gracl.c b/grsecurity/gracl.c
index e7f7462..4d7fe10 100644
--- a/grsecurity/gracl.c
+++ b/grsecurity/gracl.c
@@ -963,10 +963,10 @@ free_variables(void)

read_lock(&tasklist_lock);
do_each_thread(task2, task) {
- task->acl_sp_role = 0;
- task->acl_role_id = 0;
- task->acl = NULL;
- task->role = NULL;
+ // task->acl_sp_role = 0;
+ // task->acl_role_id = 0;
+ // task->acl = NULL;
+ // task->role = NULL;
} while_each_thread(task2, task);
read_unlock(&tasklist_lock);

Will above bring any side effect in your perspective? Looking forward to get your opinion. Thanks!
edward.yangx
 
Posts: 7
Joined: Mon Aug 12, 2013 10:27 am


Return to grsecurity support