On Mon, Dec 26, 2022 at 7:08 PM Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> wrote: > > --- a/arch/x86/include/asm/uaccess.h > +++ b/arch/x86/include/asm/uaccess.h > @@ -21,6 +22,37 @@ static inline bool pagefault_disabled(void); > # define WARN_ON_IN_IRQ() > #endif > > +#ifdef CONFIG_X86_64 I think this should be CONFIG_ADDRESS_MASKING or something like that. This is not a "64 vs 32-bit feature". This is something else. Even if you then were to select it unconditionally for 64-bit kernels (but why would you?) it reads better if the #ifdef's make sense. > +#define __untagged_addr(mm, addr) ({ \ > + u64 __addr = (__force u64)(addr); \ > + s64 sign = (s64)__addr >> 63; \ > + __addr &= READ_ONCE((mm)->context.untag_mask) | sign; \ Now the READ_ONCE() doesn't make much sense. There shouldn't be any data races on that thing. Plus: > +#define untagged_addr(addr) __untagged_addr(current->mm, addr) I think this should at least allow caching it in 'current' without the mm indirection. In fact, it might be even better off as a per-cpu variable. Because it is now in somewhat crititcal code sections: > -#define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); }) > +#define get_user(x,ptr) \ > +({ \ > + might_fault(); \ > + do_get_user_call(get_user,x,untagged_ptr(ptr)); \ > +}) This is disgusting and wrong. The whole reason we do do_get_user_call() as a function call is because we *don't* want to do this kind of stuff at the call sites. We used to inline it all, but with all the clac/stac and access_ok checks, it all just ended up ballooning so much that it was much better to make it a special function call with particular calling conventions. That untagged_ptr() should be done in that asm function, not in every call site. Now, the sad part is that we got *rid* of all this kind of crap not that long ago when Christoph cleaned up the old legacy set_fs() mess, and we were able to make the task limit be a constant (ok, be _two_ constants, depending on LA57). So we'd have to re-introduce that nasty "look up task size dynamically". See commit 47058bb54b57 ("x86: remove address space overrides using set_fs()") for the removal that would have to be re-instated. But see above about "maybe it should be a per-cpu variable" - and making that ALTERNATIVE th8ing even nastier. Another alternative mght be to *only* test the sign bit in the get_user/put_user functions, and just take the fault instead. Right now we warn about non-canonical addresses because it implies somebody might have missed an access_ok(), but we'd just mark those get_user/put_user accesses special. That would get this all entirely off the critical path. Most other address masking is for relatively rare things (ie mmap/munmap), but the user accesses are hot. Hmm? Linus