Re: [PATCH 00/35] Shadow stacks for userspace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(added more CRIU people)

On Sun, Jan 30, 2022 at 01:18:03PM -0800, Rick Edgecombe wrote:
> Hi,
> 
> This is a slight reboot of the userspace CET series. I will be taking over the 
> series from Yu-cheng. Per some internal recommendations, I’ve reset the version
> number and am calling it a new series. Hopefully, it doesn’t cause confusion.
> 
> The new plan is to upstream only userspace Shadow Stack support at this point. 
> IBT can follow later, but for now I’ll focus solely on the most in-demand and
> widely available (with the feature on AMD CPUs now) part of CET.
> 
> I thought as part of this reset, it might be useful to more fully write-up the 
> design and summarize the history of the previous CET series. So this slightly
> long cover letter does that. The "Updates" section has the changes, if anyone
> doesn't want the history.
> 
> 
> Why is Shadow Stack Wanted
> ==========================
> The main use case for userspace shadow stack is providing protection against 
> return oriented programming attacks. Fedora and Ubuntu already have many/most 
> packages enabled for shadow stack. The main missing piece is Linux kernel 
> support and there seems to be a high amount of interest in the ecosystem for
> getting this feature supported. Besides security, Google has also done some
> work on using shadow stack to improve performance and reliability of tracing.
> 
> 
> Userspace Shadow Stack Implementation
> =====================================
> Shadow stack works by maintaining a secondary (shadow) stack that cannot be 
> directly modified by applications. When executing a CALL instruction, the 
> processor pushes the return address to both the normal stack and to the special 
> permissioned shadow stack. Upon ret, the processor pops the shadow stack copy 
> and compares it to the normal stack copy. If the two differ, the processor 
> raises a control protection fault. This implementation supports shadow stack on 
> 64 bit kernels only, with support for 32 bit only via IA32 emulation.
> 
> 	Shadow Stack Memory
> 	-------------------
> 	The majority of this series deals with changes for handling the special 
> 	shadow stack memory permissions. This memory is specified by the 
> 	Dirty+RO PTE bits. A tricky aspect of this is that this combination was 
> 	previously used to specify COW memory. So Linux needs to handle COW 
> 	differently when shadow stack is in use. The solution is to use a 
> 	software PTE bit to denote COW memory, and take care to clear the dirty
> 	bit when setting the memory RO.
> 
> 	Setup and Upkeep of HW Registers
> 	--------------------------------
> 	Using userspace CET requires a CR4 bit set, and also the manipulation 
> 	of two xsave managed MSRs. The kernel needs to modify these registers 
> 	during various operations like clone and signal handling. These 
> 	operations may happen when the registers are restored to the CPU, or 
> 	saved in an xsave buffer. Since the recent AMX triggered FPU overhaul 
> 	removed direct access to the xsave buffer, this series adds an 
> 	interface to operate on the supervisor xstate.
> 
> 	New ABIs
> 	--------
> 	This series introduces some new ABIs. The primary one is the shadow 
> 	stack itself. Since it is readable and the shadow stack pointer is 
> 	exposed to user space, applications can easily read and process the 
> 	shadow stack. And in fact the tracing usages plan to do exactly that.
> 
> 	Most of the shadow stack contents are written by HW, but some of the 
> 	entries are added by the kernel. The main place for this is signals. As 
> 	part of handling the signal the kernel does some manual adjustment of 
> 	the shadow stack that userspace depends on.
> 
> 	In addition to the contents of the shadow stack there is also user 
> 	visible behavior around when new shadow stacks are created and set in 
> 	the shadow stack pointer (SSP) register. This is relatively 
> 	straightforward – shadow stacks are created when new stacks are created 
> 	(thread creation, fork, etc). It is more or less what is required to 
> 	keep apps working.
> 
> 	For situations when userspace creates a new stack (i.e. makecontext(), 
> 	fibers, etc), a new syscall is provided for creating shadow stack 
> 	memory. To make the shadow stack usable, it needs to have a restore 
> 	token written to the protected memory. So the syscall provides a way to 
> 	specificity this should be done by the kernel.
> 
> 	When a shadow stack violation happens (when the return address of stack 
> 	not matching return address in shadow stack), a segfault is generated 
> 	with a new si_code specific to CET violations.
> 
> 	Lastly, a new arch_prctl interface is created for controlling the 
> 	enablement of CET-like features. It is intended to also be used for 
> 	LAM. It operates on the feature status per-thread, so for process wide 
> 	enabling it is intended to be used early in things like dynamic 
> 	linker/loaders. However, it can be used later for per-thread enablement 
> 	of features like WRSS.
> 
> 	WRSS
> 	----
> 	WRSS is an instruction that can write to shadow stacks. The HW provides 
> 	a way to enable this instruction for userspace use. Since shadow 
> 	stack’s are created initially protected, enabling WRSS allows any apps 
> 	that want to do unusual things with their stacks to have a way to 
> 	weaken protection and make things more flexible. A new feature bit is 
> 	defined to control enabling/disabling of WRSS.
> 
> 
> History
> =======
> The branding “CET” really consists of two features: “Shadow Stack” and 
> “Indirect Branch Tracking”. They both restrict previously allowed, but rarely 
> valid behaviors and require userspace to change to avoid these behaviors before 
> enabling the protection. These raw HW features need to be assembled into a 
> software solution across userspace and kernel in order to add security value.
> The kernel part of this solution has evolved iteratively starting with a lengthy
> RFC period. 
> 
> Until now, the enabling effort was trying to support both Shadow Stack and IBT. 
> This history will focus on a few areas of the shadow stack development history 
> that I thought stood out.
> 
> 	Signals
> 	-------
> 	Originally signals placed the location of the shadow stack restore 
> 	token inside the saved state on the stack. This was problematic from a 
> 	past ABI promises perspective. So the restore location was instead just 
> 	assumed from the shadow stack pointer. This works because in normal 
> 	allowed cases of calling sigreturn, the shadow stack pointer should be 
> 	right at the restore token at that time. There is no alternate shadow 
> 	stack support. If an alt shadow stack is added later we would need to 
> 	find a place to store the regular shadow stack token location. Options 
> 	could be to push something on the alt shadow stack, or to keep 
> 	something on the kernel side. So the current design keeps things simple 
> 	while slightly kicking the can down the road if alt shadow stacks 
> 	become a thing later. Siglongjmp is handled in glibc, using the incssp 
> 	instruction to unwind the shadow stack over the token.
> 
> 	Shadow Stack Allocation
> 	-----------------------
> 	makecontext() implementations need a way to create new shadow stacks 
> 	with restore token’s such that they can be pivoted to from userspace. 
> 	The first interface to do this was an arch_prctl(). It created a shadow 
> 	stack with a restore token pre-setup, since the kernel has an 
> 	instruction that can write to user shadow stacks. However, this 
> 	interface was abandoned for being strange.
> 
> 	The next version created PROT_SHADOW_STACK. This interface had two 
> 	problems. One, it left no options but for userspace to create writable 
> 	memory, write a restore token, then mproctect() it PROT_SHADOW_STACK. 
> 	The writable window left the shadow stack exposed, weakening the 
> 	security. Second, it caused problems with the guard pages. Since the 
> 	memory was initially created writable it did not have a guard page, but 
> 	then was mprotected later to a type of memory that should have one. 
> 	This resulted in missing guard pages and confused rb_subtree_gap’s.
> 
> 	This version introduces a new syscall that behaves similarly to the 
> 	initial arch_prctl() interface in that it has the kernel write the 
> 	restore token.
> 
> 	Enabling Interface
> 	------------------
> 	For the entire history of the original CET series, the design was to 
> 	enable shadow stack automatically if the feature bit was detected in 
> 	the elf header. Then it was userspace’s responsibility to turn it off 
> 	via an arch_prctl() if it was not desired, and this was handled by the 
> 	glibc dynamic loader. Glibc’s standard behavior (when CET if configured 
> 	is to leave shadow stack enabled if the executable and all linked 
> 	libraries are marked with shadow stacks.
> 
> 	Many distros (Fedora and others) have binaries already marked with 
> 	shadow stack, waiting for kernel support. Unfortunately their glibc 
> 	binaries expect the original arch_prctl() interface for allocating 
> 	shadow stacks, as those changes were pushed ahead of kernel support. 
> 	The net result of it all is, when updating to a kernel with shadow 
> 	stack these binaries would suddenly get shadow stack enabled and expect 
> 	the arch_prctl() interface to be there. And so calls to makecontext() 
> 	will fail, resulting in visible breakages. This series deals with this 
> 	problem as described below in "Updates".
> 
> 
> Updates
> =======
> These updates were mostly driven by public comments, but a lot of the design 
> elements are new. I would like some extra scrutiny on the updates.
> 
> 	New syscall for Shadow Stack Allocation
> 	---------------------------------------
> 	A new syscall is added for allocating shadow stacks to replace 
> 	PROT_SHADOW_STACK. Several options were considered, as described in the 
> 	“x86/cet/shstk: Introduce map_shadow_stack syscall”.
> 
> 	Xsave Managed Supervisor State Modifications
> 	--------------------------------------------
> 	The shadow stack feature requires the kernel to modify xsaves managed 
> 	state. On one of the last versions of Yu-cheng’s series Boris had 
> 	commented on the pattern it was using to do this not necessarily being 
> 	ideal. The pattern was to force a restore to the registers and always 
> 	do the modification there. Then Thomas did an overhaul of the fpu code, 
> 	part of which consisted of making raw access to the xsave buffer 
> 	private to the fpu code. So this series tries to expose access again, 
> 	and in a way that addresses Boris’ comments.
> 
> 	The method is to provide functions like wmsrl/rdmsrl, but that can 
> 	direct the operation to the correct location (registers or buffer), 
> 	while giving the proper notice to the fpu subsystem so things don’t get 
> 	clobbered or corrupted.
> 
> 	In the past a solution like this was discussed as part of the PASID 
> 	series, and Thomas was not in favor. In CET’s case there is a more 
> 	logic around the CET MSR’s than in PASID's, and wrapping this logic 
> 	minimizes near identical open coded logic needed to do this more 
> 	efficiently. In addition it resolves the above described problem of 
> 	having no access to the xsave buffer. So it is being put forward here 
> 	under the supposition that CET’s usage may lead to a different 
> 	conclusion, not to try to ignore past direction.
> 
> 	The user interrupt series has similar needs as CET, and will also use
> 	this internal interface if it’s found acceptable.
> 
> 	Support for WRSS
> 	----------------
> 	Andy Lutomirski had asked if we change the shadow stack allocation API 
> 	such that userspace cannot create arbitrary shadow stacks, then we look 
> 	at exposing an interface to enable the WRSS instruction for userspace. 
> 	This way app’s that want to do unexpected things with shadow stacks 
> 	would still have the option to create shadow stacks with arbitrary 
> 	data.
> 
> 	Switch Enabling Interface
> 	-------------------------
> 	As described above there is a problem with userspace binaries waiting 
> 	to break as soon as the kernel supports CET. This needs to be prevented 
> 	by changing the interface such that the old binaries will not enable 
> 	shadow stack AND behave as if shadow stack is not enabled. They should 
> 	run normally without shadow stack protection. Creating a new feature 
> 	(SHSTK2) for shadow stack was explored. SHSTK would never be supported 
> 	by the kernel, and all the userspace build tools would be updated to 
> 	target SHSTK2 instead of SHSTK. So old SHSTK binaries would be cleanly
> 	disabled.
> 
> 	But there are existing downsides to automatic elf header processing 
> 	based enabling. The elf header feature spec is not defined by the 
> 	kernel and there are proposals to expand it to describe additional 
> 	logic. A simpler interface where the kernel is simply told what to 
> 	enable, and leaves all the decision making to userspace, is more 
> 	flexible for userspace and simpler for the kernel. There also already 
> 	needs to be an ARCH_X86_FEATURE_ENABLE arch_prctl() for WRSS (and 
> 	likely LAM will use it too), so it avoids there being two ways to turn 
> 	on these types of features. The only tricky part for shadow stack, is 
> 	that it has to be enabled very early. Wherever the shadow stack is 
> 	enabled, the app cannot return from that point, otherwise there will be 
> 	a shadow stack violation. It turns out glibc can enable shadow stack 
> 	this early, so it works nicely. So not automatically enabling any 
> 	features in the elf header will cleanly disable all old binaries, which 
> 	expect the kernel to enable CET features automatically. Then after the 
> 	kernel changes are upstream, glibc can be updated to use the new
> 	interface. This is the solution implemented in this series.
> 
> 	Expand Commit Logs
> 	------------------
> 	As part of spinning up on this series, I found some of the commit logs 
> 	did not describe the changes in enough detail for me understand their 
> 	purpose. I tried to expand the logs and comments, where I had to go 
> 	digging. Hopefully it’s useful.
> 	
> 	Limit to only Intel Processors
> 	------------------------------
> 	Shadow stack is supported on some AMD processors, but this revision 
> 	(with expanded HW usage and xsaves changes) has only has been tested on 
> 	Intel ones. So this series has a patch to limit shadow stack support to 
> 	Intel processors. Ideally the patch would not even make it to mainline, 
> 	and should be dropped as soon as this testing is done. It's included 
> 	just in case.
> 
> 
> Future Work
> ===========
> Even though this is now exclusively a shadow stack series, there is still some 
> remaining shadow stack work to be done.
> 
> 	Ptrace
> 	------
> 	Early in the series, there was a patch to allow IA32_U_CET and
> 	IA32_PL3_SSP to be set. This patch was dropped and planned as a follow
> 	up to basic support, and it remains the plan. It will be needed for
> 	in-progress gdb support.
> 
> 	CRIU Support
> 	------------
> 	In the past there was some speculation on the mailing list about 
> 	whether CRIU would need to be taught about CET. It turns out, it does. 
> 	The first issue hit is that CRIU calls sigreturn directly from its 
> 	“parasite code” that it injects into the dumper process. This violates
> 	this shadow stack implementation’s protection that intends to prevent
> 	attackers from doing this.
> 
> 	With so many packages already enabled with shadow stack, there is 
> 	probably desire to make it work seamlessly. But in the meantime if 
> 	distros want to support shadow stack and CRIU, users could manually 
> 	disabled shadow stack via “GLIBC_TUNABLES=glibc.cpu.x86_shstk=off” for 
> 	a process they will wants to dump. It’s not ideal.
> 
> 	I’d like to hear what people think about having shadow stack in the 
> 	kernel without this resolved. Nothing would change for any users until 
> 	they enable shadow stack in the kernel and update to a glibc configured
> 	with CET. Should CRIU userspace be solved before kernel support?
> 
> 	Selftests
> 	---------
> 	There are some CET selftests being worked on and they are not included
> 	here.
> 
> Thanks,
> 
> Rick
> 
> Rick Edgecombe (7):
>   x86/mm: Prevent VM_WRITE shadow stacks
>   x86/fpu: Add helpers for modifying supervisor xstate
>   x86/fpu: Add unsafe xsave buffer helpers
>   x86/cet/shstk: Introduce map_shadow_stack syscall
>   selftests/x86: Add map_shadow_stack syscall test
>   x86/cet/shstk: Support wrss for userspace
>   x86/cpufeatures: Limit shadow stack to Intel CPUs
> 
> Yu-cheng Yu (28):
>   Documentation/x86: Add CET description
>   x86/cet/shstk: Add Kconfig option for Shadow Stack
>   x86/cpufeatures: Add CET CPU feature flags for Control-flow
>     Enforcement Technology (CET)
>   x86/cpufeatures: Introduce CPU setup and option parsing for CET
>   x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states
>   x86/cet: Add control-protection fault handler
>   x86/mm: Remove _PAGE_DIRTY from kernel RO pages
>   x86/mm: Move pmd_write(), pud_write() up in the file
>   x86/mm: Introduce _PAGE_COW
>   drm/i915/gvt: Change _PAGE_DIRTY to _PAGE_DIRTY_BITS
>   x86/mm: Update pte_modify for _PAGE_COW
>   x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for
>     transition from _PAGE_DIRTY to _PAGE_COW
>   mm: Move VM_UFFD_MINOR_BIT from 37 to 38
>   mm: Introduce VM_SHADOW_STACK for shadow stack memory
>   x86/mm: Check Shadow Stack page fault errors
>   x86/mm: Update maybe_mkwrite() for shadow stack
>   mm: Fixup places that call pte_mkwrite() directly
>   mm: Add guard pages around a shadow stack.
>   mm/mmap: Add shadow stack pages to memory accounting
>   mm: Update can_follow_write_pte() for shadow stack
>   mm/mprotect: Exclude shadow stack from preserve_write
>   mm: Re-introduce vm_flags to do_mmap()
>   x86/cet/shstk: Add user-mode shadow stack support
>   x86/process: Change copy_thread() argument 'arg' to 'stack_size'
>   x86/cet/shstk: Handle thread shadow stack
>   x86/cet/shstk: Introduce shadow stack token setup/verify routines
>   x86/cet/shstk: Handle signals for shadow stack
>   x86/cet/shstk: Add arch_prctl elf feature functions
> 
>  .../admin-guide/kernel-parameters.txt         |   4 +
>  Documentation/filesystems/proc.rst            |   1 +
>  Documentation/x86/cet.rst                     | 145 ++++++
>  Documentation/x86/index.rst                   |   1 +
>  arch/arm/kernel/signal.c                      |   2 +-
>  arch/arm64/kernel/signal.c                    |   2 +-
>  arch/arm64/kernel/signal32.c                  |   2 +-
>  arch/sparc/kernel/signal32.c                  |   2 +-
>  arch/sparc/kernel/signal_64.c                 |   2 +-
>  arch/x86/Kconfig                              |  22 +
>  arch/x86/Kconfig.assembler                    |   5 +
>  arch/x86/entry/syscalls/syscall_32.tbl        |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
>  arch/x86/ia32/ia32_signal.c                   |  25 +-
>  arch/x86/include/asm/cet.h                    |  54 +++
>  arch/x86/include/asm/cpufeatures.h            |   1 +
>  arch/x86/include/asm/disabled-features.h      |   8 +-
>  arch/x86/include/asm/fpu/api.h                |   8 +
>  arch/x86/include/asm/fpu/types.h              |  23 +-
>  arch/x86/include/asm/fpu/xstate.h             |   6 +-
>  arch/x86/include/asm/idtentry.h               |   4 +
>  arch/x86/include/asm/mman.h                   |  24 +
>  arch/x86/include/asm/mmu_context.h            |   2 +
>  arch/x86/include/asm/msr-index.h              |  20 +
>  arch/x86/include/asm/page_types.h             |   7 +
>  arch/x86/include/asm/pgtable.h                | 302 ++++++++++--
>  arch/x86/include/asm/pgtable_types.h          |  48 +-
>  arch/x86/include/asm/processor.h              |   6 +
>  arch/x86/include/asm/special_insns.h          |  30 ++
>  arch/x86/include/asm/trap_pf.h                |   2 +
>  arch/x86/include/uapi/asm/mman.h              |   8 +-
>  arch/x86/include/uapi/asm/prctl.h             |  10 +
>  arch/x86/include/uapi/asm/processor-flags.h   |   2 +
>  arch/x86/kernel/Makefile                      |   1 +
>  arch/x86/kernel/cpu/common.c                  |  20 +
>  arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
>  arch/x86/kernel/elf_feature_prctl.c           |  72 +++
>  arch/x86/kernel/fpu/xstate.c                  | 167 ++++++-
>  arch/x86/kernel/idt.c                         |   4 +
>  arch/x86/kernel/process.c                     |  17 +-
>  arch/x86/kernel/process_64.c                  |   2 +
>  arch/x86/kernel/shstk.c                       | 446 ++++++++++++++++++
>  arch/x86/kernel/signal.c                      |  13 +
>  arch/x86/kernel/signal_compat.c               |   2 +-
>  arch/x86/kernel/traps.c                       |  62 +++
>  arch/x86/mm/fault.c                           |  19 +
>  arch/x86/mm/mmap.c                            |  48 ++
>  arch/x86/mm/pat/set_memory.c                  |   2 +-
>  arch/x86/mm/pgtable.c                         |  25 +
>  drivers/gpu/drm/i915/gvt/gtt.c                |   2 +-
>  fs/aio.c                                      |   2 +-
>  fs/proc/task_mmu.c                            |   3 +
>  include/linux/mm.h                            |  19 +-
>  include/linux/pgtable.h                       |   8 +
>  include/linux/syscalls.h                      |   1 +
>  include/uapi/asm-generic/siginfo.h            |   3 +-
>  include/uapi/asm-generic/unistd.h             |   2 +-
>  ipc/shm.c                                     |   2 +-
>  kernel/sys_ni.c                               |   1 +
>  mm/gup.c                                      |  16 +-
>  mm/huge_memory.c                              |  27 +-
>  mm/memory.c                                   |   5 +-
>  mm/migrate.c                                  |   3 +-
>  mm/mmap.c                                     |  15 +-
>  mm/mprotect.c                                 |   9 +-
>  mm/nommu.c                                    |   4 +-
>  mm/util.c                                     |   2 +-
>  tools/testing/selftests/x86/Makefile          |   9 +-
>  .../selftests/x86/test_map_shadow_stack.c     |  75 +++
>  69 files changed, 1797 insertions(+), 92 deletions(-)
>  create mode 100644 Documentation/x86/cet.rst
>  create mode 100644 arch/x86/include/asm/cet.h
>  create mode 100644 arch/x86/include/asm/mman.h
>  create mode 100644 arch/x86/kernel/elf_feature_prctl.c
>  create mode 100644 arch/x86/kernel/shstk.c
>  create mode 100644 tools/testing/selftests/x86/test_map_shadow_stack.c
> 
> 
> base-commit: e783362eb54cd99b2cac8b3a9aeac942e6f6ac07
> -- 
> 2.17.1

-- 
Sincerely yours,
Mike.



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux