I'm following up on this bug with my latest tests.
First of all: this is NOT a PaX bug, it also happens on upstream kernel, so sorry for posting here, I missed that test while trying to assess the problem (I didn't think upstream would break Xen support
).
After a thousand bisects and .config tests I found that the problem is related to CONFIG_RANDOMIZE_BASE (if I disable only that and keep the rest of my config the problem goes away) and I think it's related to this commit from Kees Cook:
commit 6145cfe394a7f138f6b64491c5663f97dba12450
Author: Kees Cook <keescook@chromium.org>
Date: Thu Oct 10 17:18:18 2013 -0700
x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64
On 64-bit, this raises the maximum location to -1 GiB (from -1.5 GiB),
the upper limit currently, since the kernel fixmap page mappings need
to be moved to use the other 1 GiB (which would be the theoretical
limit when building with -mcmodel=kernel).
Signed-off-by: Kees Cook <keescook@chromium.org>
Link:
http://lkml.kernel.org/r/1381450698-287 ... romium.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The default RANDOMIZE_BASE_MAX_OFFSET for 64bits is 0x40000000, which is then used as the KERNEL_IMAGE_SIZE because of:
+#define KERNEL_IMAGE_SIZE_DEFAULT (512 * 1024 * 1024)
+#if defined(CONFIG_RANDOMIZE_BASE) && \
+ CONFIG_RANDOMIZE_BASE_MAX_OFFSET > KERNEL_IMAGE_SIZE_DEFAULT
+#define KERNEL_IMAGE_SIZE CONFIG_RANDOMIZE_BASE_MAX_OFFSET
+#else
+#define KERNEL_IMAGE_SIZE KERNEL_IMAGE_SIZE_DEFAULT
+#endif
which is then used here:
NEXT_PAGE(level2_kernel_pgt)
/*
* 512 MB kernel mapping. We spend a full page on this pagetable
* anyway.
*
* The kernel code+data+bss must not be bigger than that.
*
* (NOTE: at +512MB starts the module area, see MODULES_VADDR.
* If you want to increase this then increase MODULES_VADDR
* too.)
*/
PMDS(0, __PAGE_KERNEL_LARGE_EXEC,
KERNEL_IMAGE_SIZE/PMD_SIZE)
but as KERNEL_IMAGE_SIZE is now 1Gb this is not true anymore right? (it's not a 512 MB mapping I mean). Perhaps that's the reason for that level2_ident_pgt pointing to an empty page, some calculation seems off.
I confirmed that maintaining the RANDOMIZE_BASE but changing the MAX_OFFSET to 0x20000000 fixes the crash.
Not sure why it only affects Xen though, I guess there's something else going on that I'm missing.
Thanks again!