Re: [PATCH 22/35] x86/mm: Prevent VM_WRITE shadow stacks

Dave Hansen <dave.hansen@xxxxxxxxx> · Fri, 11 Feb 2022 14:19:49 -0800

On 1/30/22 13:18, Rick Edgecombe wrote:
> Shadow stack accesses are writes from handle_mm_fault() perspective. So to
> generate the correct PTE, maybe_mkwrite() will rely on the presence of
> VM_SHADOW_STACK or VM_WRITE in the vma.
> 
> In future patches, when VM_SHADOW_STACK is actually creatable by
> userspace, a problem could happen if a user calls
> mprotect( , , PROT_WRITE) on VM_SHADOW_STACK shadow stack memory. The code
> would then be confused in the event of shadow stack accesses, and create a
> writable PTE for a shadow stack access. Then the process would fault in a
> loop.
> 
> Prevent this from happening by blocking this kind of memory (VM_WRITE and
> VM_SHADOW_STACK) from being created, instead of complicating the fault
> handler logic to handle it.
> 
> Add an x86 arch_validate_flags() implementation to handle the check.
> Rename the uapi/asm/mman.h header guard to be able to use it for
> arch/x86/include/asm/mman.h where the arch_validate_flags() will be.

It would be great if this also said:

	There is an existing arch_validate_flags() hook for mmap() and
	mprotect() which allows architectures to reject unwanted
	->vm_flags combinations.  Add an implementation for x86.

That's somewhat implied from what is there already, but making it more
clear would be nice.  There's a much higher bar to add a new arch hook
than to just implement an existing one.

> diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h
> new file mode 100644
> index 000000000000..b44fe31deb3a
> --- /dev/null
> +++ b/arch/x86/include/asm/mman.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_MMAN_H
> +#define _ASM_X86_MMAN_H
> +
> +#include <linux/mm.h>
> +#include <uapi/asm/mman.h>
> +
> +#ifdef CONFIG_X86_SHADOW_STACK
> +static inline bool arch_validate_flags(unsigned long vm_flags)
> +{
> +	if ((vm_flags & VM_SHADOW_STACK) && (vm_flags & VM_WRITE))
> +		return false;
> +
> +	return true;
> +}

The design decision here seems to be that VM_SHADOW_STACK is itself a
pseudo-VM_WRITE flag.  Like you said: "Shadow stack accesses are writes
from handle_mm_fault()".

Very early on, this series seems to have made the decision that shadow
stacks are writable and need lots of write handling behavior, *BUT*
shouldn't have VM_WRITE set.  As a whole, that seems odd.

The alternative would be *requiring* VM_WRITE and VM_SHADOW_STACK be set
together.  I guess the downside is that pte_mkwrite() would need to be
made to work on shadow stack PTEs.

That particular design decision was never discussed.  I think it has a
really big impact on the rest of the series.  What do you think?  Was it
a good idea?  Or would the alternative be more complicated than what you
have now?