Since the dawn of time, a kernel stack overflow has been a real PITA to debug, has caused nondeterministic crashes some time after the actual overflow, and has generally been easy to exploit for root. With this series, arches can enable HAVE_ARCH_VMAP_STACK. Arches that enable it (just x86 for now) get virtually mapped stacks with guard pages. This causes reliable faults when the stack overflows. If the arch implements it well, we get a nice OOPS on stack overflow (as opposed to panicing directly or otherwise exploding badly). On x86, the OOPS is nice, has a usable call trace, and the overflowing task is killed cleanly. On my laptop, this adds about 1.5µs of overhead to task creation, which seems to be mainly caused by vmalloc inefficiently allocating individual pages even when a higher-order page is available on the freelist. This does not address interrupt stacks. It also does not address the possibility of privilege escalation by a controlled stack overflow that overwrites thread_info without hitting the guard page. I'll send patches to address the latter issue once this series lands. It's worth noting that s390 has an arch-specific gcc feature that detects stack overflows by adjusting function prologues. Arches with features like that may wish to avoid using vmapped stacks to minimize the performance hit. Ingo, once this gets a bit more review, would it make sense to throw it into a seaparate branch in -tip? I wouldn't mind seeing some -next testing to give people a chance to shake out problems. I'm particularly interested in whether there are any drivers that expect virt_to_phys to work on stack addresses. (I know that virtio-net used to, but I fixed that a while back.) Changes from v1: - Fix rewind_stack_and_do_exit (Josh) - Fix deadlock under load - Clean up generic stack vmalloc code - Many other minor fixes Andy Lutomirski (12): x86/cpa: In populate_pgd, don't set the pgd entry until it's populated x86/cpa: Warn if kernel_unmap_pages_in_pgd is used inappropriately mm: Track NR_KERNEL_STACK in KiB instead of number of stacks mm: Move memcg stack accounting to account_kernel_stack fork: Add generic vmalloced stack support x86/die: Don't try to recover from an OOPS on a non-default stack x86/dumpstack: When OOPSing, rewind the stack before do_exit x86/dumpstack: When dumping stack bytes due to OOPS, start with regs->sp x86/dumpstack: Try harder to get a call trace on stack overflow x86/dumpstack/64: Handle faults when printing the "Stack:" part of an OOPS x86/mm/64: Enable vmapped stacks x86/mm: Improve stack-overflow #PF handling Ingo Molnar (1): x86/mm/hotplug: Don't remove PGD entries in remove_pagetable() arch/Kconfig | 29 +++++++++++++ arch/ia64/include/asm/thread_info.h | 2 +- arch/x86/Kconfig | 1 + arch/x86/entry/entry_32.S | 11 +++++ arch/x86/entry/entry_64.S | 11 +++++ arch/x86/include/asm/switch_to.h | 28 +++++++++++- arch/x86/include/asm/traps.h | 6 +++ arch/x86/kernel/dumpstack.c | 19 ++++++++- arch/x86/kernel/dumpstack_32.c | 4 +- arch/x86/kernel/dumpstack_64.c | 16 +++++-- arch/x86/kernel/traps.c | 32 ++++++++++++++ arch/x86/mm/fault.c | 39 +++++++++++++++++ arch/x86/mm/init_64.c | 27 ------------ arch/x86/mm/pageattr.c | 7 ++- arch/x86/mm/tlb.c | 15 +++++++ drivers/base/node.c | 3 +- fs/proc/meminfo.c | 2 +- include/linux/mmzone.h | 2 +- include/linux/sched.h | 15 +++++++ kernel/fork.c | 85 ++++++++++++++++++++++++++++--------- mm/page_alloc.c | 3 +- 21 files changed, 295 insertions(+), 62 deletions(-) -- 2.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html