Page 1 of 1

pte_unmap and do_anonymous_page

PostPosted: Fri Dec 10, 2010 4:22 pm
by storm
I was looking over the PaX patch (thanks PaX Team!) and found something I had a question about. Most likely it is just due to my unfamiliarity with the code, but wanted to ask anyhow.

The specified semantics for do_anonymous_page (mm/memory.c) state that entry is with the PTE mapped but exit with it unmapped. Granted, with many architectures this is a moot point, but in the PaX patched version the code path through if (!(flags & FAULT_FLAG_WRITE)) appears to leave the PTE mapped on exit.

Also, PaX removes the check_stack_guard_page() function. Is this because PaX ensures a guard page via the expand_* functions?

Re: pte_unmap and do_anonymous_page

PostPosted: Fri Dec 10, 2010 8:07 pm
by spender
I can answer the second half at least ;) Some of this probably isn't new to you, but I'm not sure if it's been written about on the forums yet so I'm including it here.

PaX replaced the single-page stack guard page with a 64kb enforced gap below the stack. The amount is configurable via /proc/sys/vm/heap_stack_gap. The check itself is performed by check_heap_stack_gap(). The use of an enforced gap removes the need for any of the hacks required by the upstream solution in order to fool userland into thinking that there's nothing below the stack. It affected more than just LVM, though the effect is less visible in other kinds of applications (i.e., not a crash+exit, but a silent change in how they operate, particularly in those that attempt to implement their own stack guards).

The amount of the gap is important too. If you have a decent amount of local variables, gcc generates code that would jump over any single-page stack guard. This behavior is determined by the compiler: MSVC for instance will use an implicit alloca() which triggers stack expansion on each additional page needed. To illustrate, here's a simple program:

Code: Select all
#include <stdio.h>

int main(void)
{
        char blah[16384];
        char blah2[16384];
        strcpy(blah2, "yes");

        return 0;
}


Here's a portion of the resulting disassembly of the code generated by gcc 4.3.2:
Code: Select all
080483a4 <main>:
 80483a4:       8d 4c 24 04             lea    ecx,[esp+0x4]
 80483a8:       83 e4 f0                and    esp,0xfffffff0
 80483ab:       ff 71 fc                push   DWORD PTR [ecx-0x4]
 80483ae:       55                      push   ebp
 80483af:       89 e5                   mov    ebp,esp
 80483b1:       51                      push   ecx
 80483b2:       81 ec 14 80 00 00       sub    esp,0x8014
 80483b8:       c7 44 24 08 04 00 00    mov    DWORD PTR [esp+0x8],0x4
 80483bf:       00
 80483c0:       c7 44 24 04 b0 84 04    mov    DWORD PTR [esp+0x4],0x80484b0


So the choice of 64kb for the default value of the gap comes from some insight into the prevalence of > 4KB of local variables (think PATH_MAX-sized strings to store filenames, etc) in order to prevent the code generated by gcc from being able to skip over any gap/guard and scribble on memory in the heap.

-Brad

Re: pte_unmap and do_anonymous_page

PostPosted: Sat Dec 11, 2010 5:43 pm
by storm
Thanks Brad, yes the 64kb gap makes a lot of sense. Agreed on the ease of jumping over a single page. :)

Re: pte_unmap and do_anonymous_page

PostPosted: Sun Dec 12, 2010 1:05 pm
by spender
As for the first part of the question, what you're seeing in mm/memory.c:do_anonymous_page() is mostly just a revert of Linus' stack guard code. I assume you're talking about the pte_unmap(page_table) which exists above the flags check in the vanilla kernel, but is below it in the PaX kernel. In Linus' code he modifies page_table within the flags check block, requiring the unmap above it, but page_table isn't touched there in the PaX (and older vanilla) versions. If you take a look at any vanilla kernel prior to Linus' stack guard code, you'll notice it looks the same as the PaX kernel. So you're not seeing any oddity, it's just a patch revert.

-Brad

Re: pte_unmap and do_anonymous_page

PostPosted: Mon Dec 13, 2010 8:00 pm
by storm
Ok, I think I'm grokking this better. The thing I missed was the pte_unmap_unlock() call down below the "unlock:" label, which calls pte_unmap(), guaranteeing the stated function exit conditions.

Thanks!