Page 1 of 1

acpid locking up

PostPosted: Wed Oct 02, 2002 4:23 pm
by torne
acpid is causing huge problems on my system, but works correctly with an identical non-grsecurity kernel.

What happens is this: it starts normally, but if you subsequently try to read its /proc/pid directory, the process reading the information hangs and can only be terminated by the kernel (sysrq-k). This means that ps crashes as soon as it reaches acpid (without printing anything for it) and even trying to ls /proc/pid just to display the contents locks up.

acpid seems not to be working either during this - it doesn't respond to the power button, in any case.

This causes all kinds of things to go horribly wrong - start-stop-daemon can't cope..etc.

Using an otherwise identical kernel without grsecurity enabled, it doesn't do this.

I haven't got the ACL system or any sysctl-controlled options turned on at all (I have not yet configured grsecurity), which vastly limits the scope of the bug. I can't get strace to attach to acpid, I can't view its maps file, I can't really get any debugging information at all =(

I'm building a selection of kernels with different options enabled/disabled to try and localise the problem, but I'm posting this now in case anyone has any bright ideas that could save me some time =)

I'm using grsecurity 1.9.7 and Debian's 2.4.19 source.

Followup will come once I've experimented more..

Torne

PostPosted: Wed Oct 02, 2002 7:03 pm
by torne
OK.. I forgot to try chpax'ing the binary. It fails when segmentation based noexec is enabled. It works with all the other options, and works fine with paging based noexec.

The only bit of relevant information I can think of is the maps file:
08048000-0804c000 r-xp 00000000 3a:0b 12007 /usr/sbin/acpid
0804c000-0804d000 rw-p 00004000 3a:0b 12007 /usr/sbin/acpid
0804d000-08050000 rwxp 00000000 00:00 0
40000000-40013000 r-xp 00000000 09:00 1069 /lib/ld-2.2.5.so
40013000-40014000 rw-p 00013000 09:00 1069 /lib/ld-2.2.5.so
40017000-4012a000 r-xp 00000000 09:00 1076 /lib/libc-2.2.5.so
4012a000-40130000 rw-p 00113000 09:00 1076 /lib/libc-2.2.5.so
40130000-40134000 rw-p 00000000 00:00 0
bfffc000-c0000000 rwxp ffffd000 00:00 0

If there's anything else that will help, let me know. Thanks =)

Torne

PostPosted: Thu Oct 03, 2002 11:58 am
by PaX Team
torne wrote:OK.. I forgot to try chpax'ing the binary. It fails when segmentation based noexec is enabled. It works with all the other options, and works fine with paging based noexec.

If there's anything else that will help, let me know. Thanks =)
Could you enable ACPI debugging to see if that reports anything useful?

PostPosted: Fri Oct 04, 2002 12:42 pm
by torne
The debug mode doesn't seem to show any useful info.. however it does force the daemon to stay in the foreground and I managed to strace it. The output:

execve("/usr/sbin/acpid", ["acpid", "-d", "-d", "-d", "-c", "/etc/acpi/events", "-s", "/var/run/.acpid.socket"], [/* 24 vars */]) = 0
uname({sys="Linux", node="whitefang", ...}) = 0
brk(0) = 0x804d764
open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=8733, ...}) = 0
old_mmap(NULL, 8733, PROT_READ, MAP_PRIVATE, 3, 0) = 0x273f3000
close(3) = 0
open("/lib/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\30\222"..., 1024) = 1024
fstat64(3, {st_mode=S_IFREG|0755, st_size=1153784, ...}) = 0
old_mmap(NULL, 1166560, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x273f6000
mprotect(0x27509000, 40160, PROT_NONE) = 0
old_mmap(0x27509000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x113000) = 0x27509000
old_mmap(0x2750f000, 15584, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2750f000
close(3) = 0
munmap(0x273f3000, 8733) = 0
geteuid32() = 0
open("/proc/acpi/event", O_RDONLY) = 3
fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
fcntl64(3, F_GETFL) = 0 (flags O_RDONLY)
fcntl64(3, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
read(3, 0x5a6c328f, 1) = -1 EAGAIN (Resource temporarily unavailable)
fcntl64(3, F_SETFL, O_RDONLY) = 0
unlink("/var/run/.acpid.socket") = 0
socket(PF_UNIX, SOCK_STREAM, 0) = 4
bind(4, {sin_family=AF_UNIX, path="/var/run/.acpid.socket"}, 110) = 0
listen(4, 10) = 0
fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
chmod("/var/run/.acpid.socket", 0666) = 0
time(NULL) = 1033748595
brk(0) = 0x804d764
brk(0x804d78c) = 0x804d78c
brk(0x804e000) = 0x804e000
open("/etc/localtime", O_RDONLY) = 5
fstat64(5, {st_mode=S_IFREG|0644, st_size=1323, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x273f3000
read(5, "TZif\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\7\0\0\0\7\0"..., 4096) = 1323
close(5) = 0
munmap(0x273f3000, 4096) = 0
write(2, "[Fri Oct 4 17:23:15 2002] ", 27) = 27
write(2, "starting up\n", 12) = 12
rt_sigaction(SIGHUP, {0x8049da8, [HUP], SA_RESTART|0x4000000}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGINT, {0x8049d7c, [INT], SA_RESTART|0x4000000}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGQUIT, {0x8049d7c, [QUIT], SA_RESTART|0x4000000}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGTERM, {0x8049d7c, [TERM], SA_RESTART|0x4000000}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT TERM], NULL, 8) = 0
open("/dev/null", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = -1 ENOTDIR (Not a directory)
open("/etc/acpi/events", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 5
fstat64(5, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0
fcntl64(5, F_SETFD, FD_CLOEXEC) = 0
brk(0x8050000) = 0x8050000
getdents64(0x5, 0x804de30, 0x1000, 0xb) = 80
time(NULL) = 1033748595
write(2, "[Fri Oct 4 17:23:15 2002] ", 27) = 27
write(2, "DBG: parsing conf file /etc/acpi"..., 49) = 49
open("/etc/acpi/events/powerbtn", O_RDONLY) = 6
fstat64(6, {st_mode=S_IFREG|0644, st_size=425, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x273f3000
read(6, "# /etc/acpid/events/powerbtn\n# T"..., 4096) = 425
time(NULL) = 1033748595
write(2, "[Fri Oct 4 17:23:15 2002] ", 27) = 27
write(2, "DBG: key=\"event\" val=\"button["..., 42) = 42
time(NULL) = 1033748595
write(2, "[Fri Oct 4 17:23:15 2002] ", 27) = 27
write(2, "DBG: key=\"action\" val=\"/etc/a"..., 49) = 49
read(6, "", 4096) = 0
close(6) = 0
munmap(0x273f3000, 4096) = 0
munmap(0xa28e729a, 2498353563) = -1 EINVAL (Invalid argument)
getdents64(0x5, 0x804de30, 0x1000, 0x16) = 0

It dies on getdents64 on file handle 5. File handle 5 is the /etc/acpi/events directory. No idea why. I also straced a normal run (with page-based noexec instead of segmentation) and it does the above, then continues with:

close(5) = 0
time(NULL) = 1033749719
write(2, "[Fri Oct 4 17:41:59 2002] ", 27) = 27
write(2, "DBG: unblocking signals for rule"..., 38) = 38
rt_sigprocmask(SIG_UNBLOCK, [HUP INT QUIT TERM], NULL, 8) = 0
time(NULL) = 1033749719
write(2, "[Fri Oct 4 17:41:59 2002] ", 27) = 27
write(2, "1 rule loaded\n", 14) = 14

I cut off at this point because the next call is to poll the ACPI events interface for data. This means that the failure is occuring before it's even opened the ACPI interface - it is in fact dying in that getdents64 call as far as I can see.

Well.. that's my explanation of what's going on to the best of my ability. I hope this is enough to allow you to work out *why*, which is the part that stumps me. =)

Torne

PostPosted: Fri Oct 04, 2002 2:08 pm
by PaX Team
torne wrote:munmap(0x273f3000, 4096) = 0
munmap(0xa28e729a, 2498353563) = -1 EINVAL (Invalid argument)
getdents64(0x5, 0x804de30, 0x1000, 0x16) = 0

It dies on getdents64 on file handle 5. File handle 5 is the /etc/acpi/events directory. No idea why.

oh boy... am i ashamed of myself now ;P. the problem is not with getdents64(), notice that it does return from the kernel properly. however just above that you can see a funny looking munmap(). it's funny in that it tries to unmap memory with some seemingly bogus parameters (with SEGMEXEC noone should try to touch memory above 0x60000000). it's probably a bug in acpid/glibc/etc somewhere, but what's mine is the handling of this bogus munmap(). i simply forgot to release the mmap semaphore for the process (or even better, validate the parameters outside of the locked region), so it will hang immediately on the next page fault in the page fault handler (trying to reacquire the semaphor it already holds). what a stupid bug, patch is in the CVS by the time to read this (same site as the grsecurity CVS).

PostPosted: Fri Oct 04, 2002 6:25 pm
by torne
Aha! I need to practise reading strace output a bit more methinks (I normally code in Java so have nice convenient backtraces around all the time). I'll check it out in a few days because my server's going down so I can move house.

Thanks,
Torne