On Tue, 25 Jun 2024 at 11:12, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > But yes, it would be lovely if we did things as "implement the > low-level accessor functions and we'll wrap them for the generic case" > rather than have every architecture have to do the wrapping.. Btw, to do that _well_, we need to expand on the user access functions a bit more. In particular, we can already implement "get_user()" on top of user_access_begin() and friends something like this: #define get_user(x,ptr) ({ \ __label__ Efault_read; \ long __err; \ __typeof__(ptr) __ptr = (ptr); \ if (likely(user_access_begin(__ptr, sizeof(x)))) { \ unsafe_get_user(x, __ptr, Efault_read); \ user_access_end(); \ __err = 0; \ } else { \ if (0) { \ Efault_read: user_access_end(); \ } \ x = (__typeof__(x))0; \ __err = -EFAULT; \ } \ __err; }) and it doesn't generate _horrible_ code. It looks pretty bad, but all the error handling goes out-of-line, so on x86-64 (without debug options) it generates code something like this: test %rdi,%rdi js <cap_validate_magic+0x98> stac lfence mov (%rdi),%ecx clac which is certainly not horrid. But it's not great, because that lfence ends up really screwing up the pipeline. The manually coded out-of-line code generates this instead: mov %rax,%rdx sar $0x3f,%rdx or %rdx,%rax stac movzbl (%rax),%edx xor %eax,%eax clac ret because it doesn't do a conditional branch (with the required spectre thing), but instead does the address as a data dependency and knows that "all bits set" if the address was negative will cause a page fault. But we *can* get the user accesses to do the same with a slight expansion of user_access_begin(): stac mov %rdi,%rax sar $0x3f,%rax or %rdi,%rax mov (%rax),%eax clac by just introducing a notion of "masked_user_access". The get_user() macro part would look like this: __typeof__(ptr) __ptr; \ __ptr = masked_user_access_begin(ptr); \ if (1) { \ unsafe_get_user(x, __ptr, Efault_read); \ user_access_end(); \ and the patch to implement this is attached. I've had it around for a while, but I don't know how many architectures can do this. Note this part of the commit message: This model only works for dense accesses that start at 'src' and on architectures that have a guard region that is guaranteed to fault in between the user space and the kernel space area. which is true on x86-64, but that "guard region" thing might not be true everywhere. Will/Catalin - would that src = masked_user_access_begin(src); work on arm64? The code does do something like that with __uaccess_mask_ptr() already, but at least currently it doesn't do the "avoid conditional entirely", the masking is just in _addition_ to the access_ok(). Linus
From 6b2c9a69efc21b9e6e0497a5661273f6fbe204b2 Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Date: Mon, 8 Apr 2024 20:04:58 -0700 Subject: [PATCH] x86: support user address masking instead of non-speculative conditional The Spectre-v1 mitigations made "access_ok()" much more expensive, since it has to serialize execution with the test for a valid user address. All the normal user copy routines avoid this by just masking the user address with a data-dependent mask instead, but the fast "unsafe_user_read()" kind of patterms that were supposed to be a fast case got slowed down. This introduces a notion of using src = masked_user_access_begin(src); to do the user address sanity using a data-dependent mask instead of the more traditional conditional if (user_read_access_begin(src, len)) { model. This model only works for dense accesses that start at 'src' and on architectures that have a guard region that is guaranteed to fault in between the user space and the kernel space area. With this, the user access doesn't need to be manually checked, because a bad address is guaranteed to fault (by some architecture masking trick: on x86-64 this involves just turning an invalid user address into all ones, since we don't map the top of address space). This only converts a couple of examples for now. Example x86-64 code generation for loading two words from user space: stac mov %rax,%rcx sar $0x3f,%rcx or %rax,%rcx mov (%rcx),%r13 mov 0x8(%rcx),%r14 clac where all the error handling and -EFAULT is now purely handled out of line by the exception path. Of course, if the micro-architecture does badly at 'clac' and 'stac', the above is still pitifully slow. But at least we did as well as we could. Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> --- arch/x86/include/asm/uaccess_64.h | 8 ++++++++ fs/select.c | 4 +++- include/linux/uaccess.h | 7 +++++++ lib/strncpy_from_user.c | 9 +++++++++ lib/strnlen_user.c | 9 +++++++++ 5 files changed, 36 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h index 04789f45ab2b..a10149a96d9e 100644 --- a/arch/x86/include/asm/uaccess_64.h +++ b/arch/x86/include/asm/uaccess_64.h @@ -53,6 +53,14 @@ static inline unsigned long __untagged_addr_remote(struct mm_struct *mm, */ #define valid_user_address(x) ((__force long)(x) >= 0) +/* + * Masking the user address is an alternative to a conditional + * user_access_begin that can avoid the fencing. This only works + * for dense accesses starting at the address. + */ +#define mask_user_address(x) ((typeof(x))((long)(x)|((long)(x)>>63))) +#define masked_user_access_begin(x) ({ __uaccess_begin(); mask_user_address(x); }) + /* * User pointers can have tag bits on x86-64. This scheme tolerates * arbitrary values in those bits rather then masking them off. diff --git a/fs/select.c b/fs/select.c index 9515c3fa1a03..bc185d111436 100644 --- a/fs/select.c +++ b/fs/select.c @@ -780,7 +780,9 @@ static inline int get_sigset_argpack(struct sigset_argpack *to, { // the path is hot enough for overhead of copy_from_user() to matter if (from) { - if (!user_read_access_begin(from, sizeof(*from))) + if (can_do_masked_user_access()) + from = masked_user_access_begin(from); + else if (!user_read_access_begin(from, sizeof(*from))) return -EFAULT; unsafe_get_user(to->p, &from->p, Efault); unsafe_get_user(to->size, &from->size, Efault); diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index 3064314f4832..f18371f6cf36 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -32,6 +32,13 @@ }) #endif +#ifdef masked_user_access_begin + #define can_do_masked_user_access() 1 +#else + #define can_do_masked_user_access() 0 + #define masked_user_access_begin(src) NULL +#endif + /* * Architectures should provide two primitives (raw_copy_{to,from}_user()) * and get rid of their private instances of copy_{to,from}_user() and diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c index 6432b8c3e431..989a12a67872 100644 --- a/lib/strncpy_from_user.c +++ b/lib/strncpy_from_user.c @@ -120,6 +120,15 @@ long strncpy_from_user(char *dst, const char __user *src, long count) if (unlikely(count <= 0)) return 0; + if (can_do_masked_user_access()) { + long retval; + + src = masked_user_access_begin(src); + retval = do_strncpy_from_user(dst, src, count, count); + user_read_access_end(); + return retval; + } + max_addr = TASK_SIZE_MAX; src_addr = (unsigned long)untagged_addr(src); if (likely(src_addr < max_addr)) { diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c index feeb935a2299..6e489f9e90f1 100644 --- a/lib/strnlen_user.c +++ b/lib/strnlen_user.c @@ -96,6 +96,15 @@ long strnlen_user(const char __user *str, long count) if (unlikely(count <= 0)) return 0; + if (can_do_masked_user_access()) { + long retval; + + str = masked_user_access_begin(str); + retval = do_strnlen_user(str, count, count); + user_read_access_end(); + return retval; + } + max_addr = TASK_SIZE_MAX; src_addr = (unsigned long)untagged_addr(str); if (likely(src_addr < max_addr)) {