[PATCH 00/37 v6] PTI support for x86-32

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

here is the new version of my PTI patches for x86-32 which
implement last weeks review comments.

Changes to v5 are:

	* Rebased to v4.17-rc2

	* Removed the protection changes between memory
	  areas mapped in the kernel and user page-tables
	  with global bit set

	* Added kernel text mapping to the user-space
	  page-table as it is done on x86-64 to gain
	  performance

	* Measured the performance again

The result of the changes are two small new patches, namely
patches 27 and 28 which implement most of the above. I also
removed the GLB bit clearing from patch 26.

Here are the new performance numbers, I used the same
benchmark that Ingo suggested to use for v2. It actually
shows quite some improvement over the numbers gathered back
then. The reason is most likely the global kernel
text-mapping added to the user page-table. In particular,
the numbers are:

For 'perf stat --null --sync --repeat 50 perf bench sched messaging -g 20':

v4.17-rc2          : 0.306761370 seconds time elapsed                                          ( +-  0.93% )
pti-x32-v6 pti=on  : 0.406391420 seconds time elapsed                                          ( +-  0.45% )
pti-x32-v6 pti=off : 0.306383858 seconds time elapsed                                          ( +-  0.90% )

and for 'perf stat --null --sync --repeat 50 perf bench sched messaging -g 20 -t':

v4.17-rc2          : 0.299934984 seconds time elapsed                                          ( +-  1.00% )
pti-x32-v6 pti=on  : 0.379535803 seconds time elapsed                                          ( +-  0.81% )
pti-x32-v6 pti=off : 0.297920551 seconds time elapsed                                          ( +-  1.12% )

So the slowdown is around 32.5% for the non-threaded test
vs. 26.5% for the threaded test. That is quite an
improvement over v2, where the slowdown for the non-threaded
test was at 57%.

The difference between v4.17-rc2 and the pti=off kernel is
in the noise, I wasn't able to measure a reliable slowdown.

For the global bit settings, all page-tables have now
identical regions mapped with global bits:

 # grep GLB /sys/kernel/debug/page_tables/kernel > kernel
 # grep GLB /sys/kernel/debug/page_tables/current_user > current_user
 # grep GLB /sys/kernel/debug/page_tables/current_kernel > current_kernel
 # sha1sum *
15820e407be3650cf705a26e2291ef1a6ef1bce0  current_kernel
15820e407be3650cf705a26e2291ef1a6ef1bce0  current_user
15820e407be3650cf705a26e2291ef1a6ef1bce0  kernel

In particular, the regions mapped with global bits are the
kernel text mapping and the cpu entry area of the address
space.

This patch-set also got similar testing like the previous
ones. I did the the load-test with 'perf top', several x86
self-tests and a -j16 kernel compile in parallel and a loop
for a couple of hours without any issues. Further I
boot-tested non-pae and also 64bit configs with these
patches.

As with previous versions there is also a branch for people
to test:

	git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git pti-x32-v6

And the previous versions are also still around and can be
found at:

	* For v5:
	  Post : https://marc.info/?l=linux-kernel&m=152389297705480&w=2
	  Git  : git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git pti-x32-v5

	* For v4:
	  Post : https://marc.info/?l=linux-kernel&m=152122860630236&w=2
	  Git  : git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git pti-x32-v4

	* For v3:
	  Post : https://marc.info/?l=linux-kernel&m=152024559419876&w=2
	  Git  : git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git pti-x32-v3

	* For v2:
	  Post : https://marc.info/?l=linux-kernel&m=151816914932088&w=2
	  Git  : git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git pti-x32-v2

Please review.

Thanks,

	Joerg

Joerg Roedel (37):
  x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c
  x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry_stack
  x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler
  x86/entry/32: Put ESPFIX code into a macro
  x86/entry/32: Unshare NMI return path
  x86/entry/32: Split off return-to-kernel path
  x86/entry/32: Enter the kernel via trampoline stack
  x86/entry/32: Leave the kernel via trampoline stack
  x86/entry/32: Introduce SAVE_ALL_NMI and RESTORE_ALL_NMI
  x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  x86/entry/32: Simplify debug entry point
  x86/32: Use tss.sp1 as cpu_current_top_of_stack
  x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  x86/entry/32: Add PTI cr3 switches to NMI handler code
  x86/pgtable: Rename pti_set_user_pgd to pti_set_user_pgtbl
  x86/pgtable/pae: Unshare kernel PMDs when PTI is enabled
  x86/pgtable/32: Allocate 8k page-tables when PTI is enabled
  x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h
  x86/pgtable: Move pti_set_user_pgtbl() to pgtable.h
  x86/pgtable: Move two more functions from pgtable_64.h to pgtable.h
  x86/mm/pae: Populate valid user PGD entries
  x86/mm/pae: Populate the user page-table with user pgd's
  x86/mm/legacy: Populate the user page-table with user pgd's
  x86/mm/pti: Add an overflow check to pti_clone_pmds()
  x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32
  x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32
  x86/mm/pti: Keep permissions when cloning kernel text in
    pti_clone_kernel_text()
  x86/mm/pti: Map kernel-text to user-space on 32 bit kernels
  x86/mm/dump_pagetables: Define INIT_PGD
  x86/pgtable/pae: Use separate kernel PMDs for user page-table
  x86/ldt: Reserve address-space range on 32 bit for the LDT
  x86/ldt: Define LDT_END_ADDR
  x86/ldt: Split out sanity check in map_ldt_struct()
  x86/ldt: Enable LDT user-mapping for PAE
  x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32
  x86/mm/pti: Add Warning when booting on a PCID capable CPU
  x86/entry/32: Add debug code to check entry/exit cr3

 arch/x86/Kconfig.debug                      |  12 +
 arch/x86/entry/entry_32.S                   | 640 +++++++++++++++++++++++-----
 arch/x86/include/asm/mmu_context.h          |   5 -
 arch/x86/include/asm/pgtable-2level.h       |   9 +
 arch/x86/include/asm/pgtable-2level_types.h |   3 +
 arch/x86/include/asm/pgtable-3level.h       |   7 +
 arch/x86/include/asm/pgtable-3level_types.h |   6 +-
 arch/x86/include/asm/pgtable.h              |  87 ++++
 arch/x86/include/asm/pgtable_32.h           |   2 -
 arch/x86/include/asm/pgtable_32_types.h     |   9 +-
 arch/x86/include/asm/pgtable_64.h           |  89 +---
 arch/x86/include/asm/pgtable_64_types.h     |   3 +
 arch/x86/include/asm/pgtable_types.h        |  28 +-
 arch/x86/include/asm/processor-flags.h      |   8 +-
 arch/x86/include/asm/processor.h            |   4 -
 arch/x86/include/asm/switch_to.h            |   6 +-
 arch/x86/include/asm/thread_info.h          |   2 -
 arch/x86/kernel/asm-offsets.c               |   5 +
 arch/x86/kernel/asm-offsets_32.c            |   2 +-
 arch/x86/kernel/asm-offsets_64.c            |   2 -
 arch/x86/kernel/cpu/common.c                |   9 +-
 arch/x86/kernel/head_32.S                   |  20 +-
 arch/x86/kernel/ldt.c                       | 137 ++++--
 arch/x86/kernel/process.c                   |   2 -
 arch/x86/kernel/process_32.c                |   4 +-
 arch/x86/mm/dump_pagetables.c               |  21 +-
 arch/x86/mm/init_32.c                       |   6 +
 arch/x86/mm/pgtable.c                       | 105 ++++-
 arch/x86/mm/pti.c                           |  44 +-
 security/Kconfig                            |   2 +-
 30 files changed, 974 insertions(+), 305 deletions(-)

-- 
2.7.4




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux