Re: [PATCH 00/30] [v3] KAISER: unmap most of the kernel from userspace page tables

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 10, 2017 at 8:30 PM, Dave Hansen
<dave.hansen@xxxxxxxxxxxxxxx> wrote:
> Thanks, everyone for all the reviews thus far.  I hope I managed to
> address all the feedback given so far, except for the TODOs of
> course.  This is a pretty minor update compared to v1->v2.
>
> These patches are all on top of Andy's entry changes here:
>
>         https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/entry_consolidation
>
> Changes from v2:
>  * Reword documentation removing "we"
>  * Fix some whitespace damage
>  * Fix up MAX ASID values off-by-one noted by Peter Z
>  * Change CodingStyle stuff from Borislav comments
>  * Always use _KERNPG_TABLE for pmd_populate_kernel().
>
> Changes from v1:
>  * Updated to be on top of Andy L's new entry code
>  * Allow global pages again, and use them for pages mapped into
>    userspace page tables.
>  * Use trampoline stack instead of process stack at entry so no
>    longer need to map process stack (big win in fork() speed)
>  * Made the page table walking less generic by restricting it
>    to kernel addresses and !_PAGE_USER pages.
>  * Added a debugfs file to enable/disable CR3 switching at
>    runtime.  This does not remove all the KAISER overhead, but
>    it removes the largest source.
>  * Use runtime disable with Xen to permit Xen-PV guests with
>    KAISER=y.
>  * Moved assembly code from "core" to "prepare assembly" patch
>  * Pass full register name to asm macros
>  * Remove double stack switch in entry_SYSENTER_compat
>  * Disable vsyscall native case when KAISER=y
>  * Separate PER_CPU_USER_MAPPED generic definitions from use
>    by arch/x86/.
>
> TODO:
>  * Allow dumping the shadow page tables with the ptdump code
>  * Put LDT at top of userspace
>  * Create separate tlb flushing functions for user and kernel
>  * Chase down the source of the new !CR4.PGE warning that 0day
>    found with i386
>
> ---
>
> tl;dr:
>
> KAISER makes it harder to defeat KASLR, but makes syscalls and
> interrupts slower.  These patches are based on work from a team at
> Graz University of Technology posted here[1].  The major addition is
> support for Intel PCIDs which builds on top of Andy Lutomorski's PCID
> work merged for 4.14.  PCIDs make KAISER's overhead very reasonable
> for a wide variety of use cases.
>
> Full Description:
>
> KAISER is a countermeasure against attacks on kernel address
> information.  There are at least three existing, published,
> approaches using the shared user/kernel mapping and hardware features
> to defeat KASLR.  One approach referenced in the paper locates the
> kernel by observing differences in page fault timing between
> present-but-inaccessable kernel pages and non-present pages.
>
> KAISER addresses this by unmapping (most of) the kernel when
> userspace runs.  It leaves the existing page tables largely alone and
> refers to them as "kernel page tables".  For running userspace, a new
> "shadow" copy of the page tables is allocated for each process.  The
> shadow page tables map all the same user memory as the "kernel" copy,
> but only maps a minimal set of kernel memory.
>
> When we enter the kernel via syscalls, interrupts or exceptions,
> page tables are switched to the full "kernel" copy.  When the system
> switches back to user mode, the "shadow" copy is used.  Process
> Context IDentifiers (PCIDs) are used to to ensure that the TLB is not
> flushed when switching between page tables, which makes syscalls
> roughly 2x faster than without it.  PCIDs are usable on Haswell and
> newer CPUs (the ones with "v4", or called fourth-generation Core).
>
> The minimal kernel page tables try to map only what is needed to
> enter/exit the kernel such as the entry/exit functions, interrupt
> descriptors (IDT) and the kernel trampoline stacks.  This minimal set
> of data can still reveal the kernel's ASLR base address.  But, this
> minimal kernel data is all trusted, which makes it harder to exploit
> than data in the kernel direct map which contains loads of
> user-controlled data.
>
> KAISER will affect performance for anything that does system calls or
> interrupts: everything.  Just the new instructions (CR3 manipulation)
> add a few hundred cycles to a syscall or interrupt.  Most workloads
> that we have run show single-digit regressions.  5% is a good round
> number for what is typical.  The worst we have seen is a roughly 30%
> regression on a loopback networking test that did a ton of syscalls
> and context switches.  More details about possible performance
> impacts are in the new Documentation/ file.
>
> This code is based on a version I downloaded from
> (https://github.com/IAIK/KAISER).  It has been heavily modified.
>
> The approach is described in detail in a paper[2].  However, there is
> some incorrect and information in the paper, both on how Linux and
> the hardware works.  For instance, I do not share the opinion that
> KAISER has "runtime overhead of only 0.28%".  Please rely on this
> patch series as the canonical source of information about this
> submission.
>
> Here is one example of how the kernel image grow with CONFIG_KAISER
> on and off.  Most of the size increase is presumably from additional
> alignment requirements for mapping entry/exit code and structures.
>
>     text    data     bss      dec filename
> 11786064 7356724 2928640 22071428 vmlinux-nokaiser
> 11798203 7371704 2928640 22098547 vmlinux-kaiser
>   +12139  +14980       0   +27119
>
> To give folks an idea what the performance impact is like, I took
> the following test and ran it single-threaded:
>
>         https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c
>
> It's a pretty quick syscall so this shows how much KAISER slows
> down syscalls (and how much PCIDs help).  The units here are
> lseeks/second:
>
>         no kaiser: 5.2M
>     kaiser+  pcid: 3.0M
>     kaiser+nopcid: 2.2M
>
> "nopcid" is literally with the "nopcid" command-line option which
> turns PCIDs off entirely.
>
> Thanks to:
> The original KAISER team at Graz University of Technology.
> Andy Lutomirski for all the help with the entry code.
> Kirill Shutemov for a helpful review of the code.
>
> 1. https://github.com/IAIK/KAISER
> 2. https://gruss.cc/files/kaiser.pdf
>
> --
>
> The code is available here:
>
>         https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/
>
>  Documentation/x86/kaiser.txt                | 160 +++++
>  arch/x86/Kconfig                            |   8 +
>  arch/x86/entry/calling.h                    |  89 +++
>  arch/x86/entry/entry_64.S                   |  44 +-
>  arch/x86/entry/entry_64_compat.S            |   8 +
>  arch/x86/events/intel/ds.c                  |  49 +-
>  arch/x86/include/asm/cpufeatures.h          |   1 +
>  arch/x86/include/asm/desc.h                 |   2 +-
>  arch/x86/include/asm/kaiser.h               |  62 ++
>  arch/x86/include/asm/mmu_context.h          |  29 +-
>  arch/x86/include/asm/pgalloc.h              |  37 +-
>  arch/x86/include/asm/pgtable.h              |  20 +-
>  arch/x86/include/asm/pgtable_64.h           | 135 +++++
>  arch/x86/include/asm/pgtable_types.h        |  25 +-
>  arch/x86/include/asm/processor.h            |   2 +-
>  arch/x86/include/asm/tlbflush.h             | 232 +++++++-
>  arch/x86/include/uapi/asm/processor-flags.h |   3 +-
>  arch/x86/kernel/cpu/common.c                |  21 +-
>  arch/x86/kernel/espfix_64.c                 |  27 +-
>  arch/x86/kernel/head_64.S                   |  30 +-
>  arch/x86/kernel/ldt.c                       |  25 +-
>  arch/x86/kernel/process.c                   |   2 +-
>  arch/x86/kernel/process_64.c                |   2 +-
>  arch/x86/kernel/traps.c                     |  46 +-
>  arch/x86/kvm/x86.c                          |   3 +-
>  arch/x86/mm/Makefile                        |   1 +
>  arch/x86/mm/init.c                          |  75 ++-
>  arch/x86/mm/kaiser.c                        | 627 ++++++++++++++++++++
>  arch/x86/mm/pageattr.c                      |  18 +-
>  arch/x86/mm/pgtable.c                       |  16 +-
>  arch/x86/mm/tlb.c                           | 105 +++-
>  include/asm-generic/vmlinux.lds.h           |  17 +
>  include/linux/kaiser.h                      |  34 ++
>  include/linux/percpu-defs.h                 |  30 +
>  init/main.c                                 |   3 +
>  kernel/fork.c                               |   1 +
>  security/Kconfig                            |  10 +
>  37 files changed, 1851 insertions(+), 148 deletions(-)
>
> Cc: Moritz Lipp <moritz.lipp@xxxxxxxxxxxxxx>
> Cc: Daniel Gruss <daniel.gruss@xxxxxxxxxxxxxx>
> Cc: Michael Schwarz <michael.schwarz@xxxxxxxxxxxxxx>
> Cc: Richard Fellner <richard.fellner@xxxxxxxxxxxxxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Cc: Kees Cook <keescook@xxxxxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: x86@xxxxxxxxxx
> Cc: Juergen Gross <jgross@xxxxxxxx>

I get a compilation error with:
CONFIG_RANDOMIZE_BASE=y

  OBJCOPY arch/x86/boot/compressed/vmlinux.bin
  RELOCS  arch/x86/boot/compressed/vmlinux.relocs
  CC      arch/x86/boot/compressed/early_serial_console.o
  CC      arch/x86/boot/compressed/kaslr.o
  CC      arch/x86/boot/compressed/pagetable.o
  CC      arch/x86/boot/compressed/misc.o
  GZIP    arch/x86/boot/compressed/vmlinux.bin.gz
  MKPIGGY arch/x86/boot/compressed/piggy.S
  AS      arch/x86/boot/compressed/piggy.o
  DATAREL arch/x86/boot/compressed/vmlinux
  LD      arch/x86/boot/compressed/vmlinux
arch/x86/boot/compressed/pagetable.o: In function `kernel_ident_mapping_init':
pagetable.c:(.text+0x31b): undefined reference to `kaiser_enabled'
arch/x86/boot/compressed/Makefile:106: recipe for target
'arch/x86/boot/compressed/vmlinux' failed
make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1
arch/x86/boot/Makefile:112: recipe for target
'arch/x86/boot/compressed/vmlinux' failed
make[1]: *** [arch/x86/boot/compressed/vmlinux] Error 2
arch/x86/Makefile:295: recipe for target 'bzImage' failed
make: *** [bzImage] Error 2

Compiles fine with:
# CONFIG_RANDOMIZE_BASE is not set

...Juerg


> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux