On Fri, Nov 10, 2017 at 8:30 PM, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> wrote: > Thanks, everyone for all the reviews thus far. I hope I managed to > address all the feedback given so far, except for the TODOs of > course. This is a pretty minor update compared to v1->v2. > > These patches are all on top of Andy's entry changes here: > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/entry_consolidation > > Changes from v2: > * Reword documentation removing "we" > * Fix some whitespace damage > * Fix up MAX ASID values off-by-one noted by Peter Z > * Change CodingStyle stuff from Borislav comments > * Always use _KERNPG_TABLE for pmd_populate_kernel(). > > Changes from v1: > * Updated to be on top of Andy L's new entry code > * Allow global pages again, and use them for pages mapped into > userspace page tables. > * Use trampoline stack instead of process stack at entry so no > longer need to map process stack (big win in fork() speed) > * Made the page table walking less generic by restricting it > to kernel addresses and !_PAGE_USER pages. > * Added a debugfs file to enable/disable CR3 switching at > runtime. This does not remove all the KAISER overhead, but > it removes the largest source. > * Use runtime disable with Xen to permit Xen-PV guests with > KAISER=y. > * Moved assembly code from "core" to "prepare assembly" patch > * Pass full register name to asm macros > * Remove double stack switch in entry_SYSENTER_compat > * Disable vsyscall native case when KAISER=y > * Separate PER_CPU_USER_MAPPED generic definitions from use > by arch/x86/. > > TODO: > * Allow dumping the shadow page tables with the ptdump code > * Put LDT at top of userspace > * Create separate tlb flushing functions for user and kernel > * Chase down the source of the new !CR4.PGE warning that 0day > found with i386 > > --- > > tl;dr: > > KAISER makes it harder to defeat KASLR, but makes syscalls and > interrupts slower. These patches are based on work from a team at > Graz University of Technology posted here[1]. The major addition is > support for Intel PCIDs which builds on top of Andy Lutomorski's PCID > work merged for 4.14. PCIDs make KAISER's overhead very reasonable > for a wide variety of use cases. > > Full Description: > > KAISER is a countermeasure against attacks on kernel address > information. There are at least three existing, published, > approaches using the shared user/kernel mapping and hardware features > to defeat KASLR. One approach referenced in the paper locates the > kernel by observing differences in page fault timing between > present-but-inaccessable kernel pages and non-present pages. > > KAISER addresses this by unmapping (most of) the kernel when > userspace runs. It leaves the existing page tables largely alone and > refers to them as "kernel page tables". For running userspace, a new > "shadow" copy of the page tables is allocated for each process. The > shadow page tables map all the same user memory as the "kernel" copy, > but only maps a minimal set of kernel memory. > > When we enter the kernel via syscalls, interrupts or exceptions, > page tables are switched to the full "kernel" copy. When the system > switches back to user mode, the "shadow" copy is used. Process > Context IDentifiers (PCIDs) are used to to ensure that the TLB is not > flushed when switching between page tables, which makes syscalls > roughly 2x faster than without it. PCIDs are usable on Haswell and > newer CPUs (the ones with "v4", or called fourth-generation Core). > > The minimal kernel page tables try to map only what is needed to > enter/exit the kernel such as the entry/exit functions, interrupt > descriptors (IDT) and the kernel trampoline stacks. This minimal set > of data can still reveal the kernel's ASLR base address. But, this > minimal kernel data is all trusted, which makes it harder to exploit > than data in the kernel direct map which contains loads of > user-controlled data. > > KAISER will affect performance for anything that does system calls or > interrupts: everything. Just the new instructions (CR3 manipulation) > add a few hundred cycles to a syscall or interrupt. Most workloads > that we have run show single-digit regressions. 5% is a good round > number for what is typical. The worst we have seen is a roughly 30% > regression on a loopback networking test that did a ton of syscalls > and context switches. More details about possible performance > impacts are in the new Documentation/ file. > > This code is based on a version I downloaded from > (https://github.com/IAIK/KAISER). It has been heavily modified. > > The approach is described in detail in a paper[2]. However, there is > some incorrect and information in the paper, both on how Linux and > the hardware works. For instance, I do not share the opinion that > KAISER has "runtime overhead of only 0.28%". Please rely on this > patch series as the canonical source of information about this > submission. > > Here is one example of how the kernel image grow with CONFIG_KAISER > on and off. Most of the size increase is presumably from additional > alignment requirements for mapping entry/exit code and structures. > > text data bss dec filename > 11786064 7356724 2928640 22071428 vmlinux-nokaiser > 11798203 7371704 2928640 22098547 vmlinux-kaiser > +12139 +14980 0 +27119 > > To give folks an idea what the performance impact is like, I took > the following test and ran it single-threaded: > > https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c > > It's a pretty quick syscall so this shows how much KAISER slows > down syscalls (and how much PCIDs help). The units here are > lseeks/second: > > no kaiser: 5.2M > kaiser+ pcid: 3.0M > kaiser+nopcid: 2.2M > > "nopcid" is literally with the "nopcid" command-line option which > turns PCIDs off entirely. > > Thanks to: > The original KAISER team at Graz University of Technology. > Andy Lutomirski for all the help with the entry code. > Kirill Shutemov for a helpful review of the code. > > 1. https://github.com/IAIK/KAISER > 2. https://gruss.cc/files/kaiser.pdf > > -- > > The code is available here: > > https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/ > > Documentation/x86/kaiser.txt | 160 +++++ > arch/x86/Kconfig | 8 + > arch/x86/entry/calling.h | 89 +++ > arch/x86/entry/entry_64.S | 44 +- > arch/x86/entry/entry_64_compat.S | 8 + > arch/x86/events/intel/ds.c | 49 +- > arch/x86/include/asm/cpufeatures.h | 1 + > arch/x86/include/asm/desc.h | 2 +- > arch/x86/include/asm/kaiser.h | 62 ++ > arch/x86/include/asm/mmu_context.h | 29 +- > arch/x86/include/asm/pgalloc.h | 37 +- > arch/x86/include/asm/pgtable.h | 20 +- > arch/x86/include/asm/pgtable_64.h | 135 +++++ > arch/x86/include/asm/pgtable_types.h | 25 +- > arch/x86/include/asm/processor.h | 2 +- > arch/x86/include/asm/tlbflush.h | 232 +++++++- > arch/x86/include/uapi/asm/processor-flags.h | 3 +- > arch/x86/kernel/cpu/common.c | 21 +- > arch/x86/kernel/espfix_64.c | 27 +- > arch/x86/kernel/head_64.S | 30 +- > arch/x86/kernel/ldt.c | 25 +- > arch/x86/kernel/process.c | 2 +- > arch/x86/kernel/process_64.c | 2 +- > arch/x86/kernel/traps.c | 46 +- > arch/x86/kvm/x86.c | 3 +- > arch/x86/mm/Makefile | 1 + > arch/x86/mm/init.c | 75 ++- > arch/x86/mm/kaiser.c | 627 ++++++++++++++++++++ > arch/x86/mm/pageattr.c | 18 +- > arch/x86/mm/pgtable.c | 16 +- > arch/x86/mm/tlb.c | 105 +++- > include/asm-generic/vmlinux.lds.h | 17 + > include/linux/kaiser.h | 34 ++ > include/linux/percpu-defs.h | 30 + > init/main.c | 3 + > kernel/fork.c | 1 + > security/Kconfig | 10 + > 37 files changed, 1851 insertions(+), 148 deletions(-) > > Cc: Moritz Lipp <moritz.lipp@xxxxxxxxxxxxxx> > Cc: Daniel Gruss <daniel.gruss@xxxxxxxxxxxxxx> > Cc: Michael Schwarz <michael.schwarz@xxxxxxxxxxxxxx> > Cc: Richard Fellner <richard.fellner@xxxxxxxxxxxxxxxxx> > Cc: Andy Lutomirski <luto@xxxxxxxxxx> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Cc: Kees Cook <keescook@xxxxxxxxxx> > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > Cc: x86@xxxxxxxxxx > Cc: Juergen Gross <jgross@xxxxxxxx> I get a compilation error with: CONFIG_RANDOMIZE_BASE=y OBJCOPY arch/x86/boot/compressed/vmlinux.bin RELOCS arch/x86/boot/compressed/vmlinux.relocs CC arch/x86/boot/compressed/early_serial_console.o CC arch/x86/boot/compressed/kaslr.o CC arch/x86/boot/compressed/pagetable.o CC arch/x86/boot/compressed/misc.o GZIP arch/x86/boot/compressed/vmlinux.bin.gz MKPIGGY arch/x86/boot/compressed/piggy.S AS arch/x86/boot/compressed/piggy.o DATAREL arch/x86/boot/compressed/vmlinux LD arch/x86/boot/compressed/vmlinux arch/x86/boot/compressed/pagetable.o: In function `kernel_ident_mapping_init': pagetable.c:(.text+0x31b): undefined reference to `kaiser_enabled' arch/x86/boot/compressed/Makefile:106: recipe for target 'arch/x86/boot/compressed/vmlinux' failed make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1 arch/x86/boot/Makefile:112: recipe for target 'arch/x86/boot/compressed/vmlinux' failed make[1]: *** [arch/x86/boot/compressed/vmlinux] Error 2 arch/x86/Makefile:295: recipe for target 'bzImage' failed make: *** [bzImage] Error 2 Compiles fine with: # CONFIG_RANDOMIZE_BASE is not set ...Juerg > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>