On 4/6/2018 3:55 PM, Dave Hansen wrote: > Changes from v4 > * Fix compile error reported by Tom Lendacky This built with CONFIG_RANDOMIZE_BASE=y, but failed to boot successfully. I think you're missing the initialization of __default_kernel_pte_mask in kaslr.c. Thanks, Tom > * Avoid setting _PAGE_GLOBAL on non-present entries > > Changes from v3: > * Fix whitespace issue noticed by willy > * Clarify comments about X86_FEATURE_PGE checks > * Clarify commit message around the necessity of _PAGE_GLOBAL > filtering when CR4.PGE=0 or PGE is unsupported. > > Changes from v2: > > * Add performance numbers to changelogs > * Fix compile error resulting from use of x86-specific > __default_kernel_pte_mask in arch-generic mm/early_ioremap.c > * Delay kernel text cloning until after we are done messing > with it (patch 11). > * Blacklist K8 explicitly from mapping all kernel text as > global (this should never happen because K8 does not use > pti when pti=auto, but we on the safe side). (patch 11) > > -- > > The later versions of the KAISER patches (pre-PTI) allowed the > user/kernel shared areas to be GLOBAL. The thought was that this would > reduce the TLB overhead of keeping two copies of these mappings. > > During the switch over to PTI, we seem to have lost our ability to have > GLOBAL mappings. This adds them back. > > To measure the benefits of this, I took a modern Atom system without > PCIDs and ran a microbenchmark[1] (higher is better): > > No Global Lines (baseline ): 6077741 lseeks/sec > 88 Global Lines (kern entry): 7528609 lseeks/sec (+23.9%) > 94 Global Lines (all ktext ): 8433111 lseeks/sec (+38.8%) > > On a modern Skylake desktop with PCIDs, the benefits are tangible, but not > huge: > > No Global pages (baseline): 15783951 lseeks/sec > 28 Global pages (this set): 16054688 lseeks/sec > +270737 lseeks/sec (+1.71%) > > I also double-checked with a kernel compile on the Skylake system (lower > is better): > > No Global pages (baseline): 186.951 seconds time elapsed ( +- 0.35% ) > 28 Global pages (this set): 185.756 seconds time elapsed ( +- 0.09% ) > -1.195 seconds (-0.64%) > > 1. https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c > > Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> > Cc: Andy Lutomirski <luto@xxxxxxxxxx> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Cc: Kees Cook <keescook@xxxxxxxxxx> > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > Cc: Juergen Gross <jgross@xxxxxxxx> > Cc: x86@xxxxxxxxxx > Cc: Nadav Amit <namit@xxxxxxxxxx> >