On Thu, Jan 20, 2022, Peter Zijlstra wrote: > Do try_cmpxchg() loops on userspace addresses. > > Cc: Sean Christopherson <seanjc@xxxxxxxxxx> > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> > --- > arch/x86/include/asm/uaccess.h | 67 +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 67 insertions(+) > > --- a/arch/x86/include/asm/uaccess.h > +++ b/arch/x86/include/asm/uaccess.h > @@ -342,6 +342,24 @@ do { \ > : [umem] "m" (__m(addr)) \ > : : label) > > +#define __try_cmpxchg_user_asm(itype, ltype, _ptr, _pold, _new, label) ({ \ > + bool success; \ > + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ > + __typeof__(*(_ptr)) __old = *_old; \ > + __typeof__(*(_ptr)) __new = (_new); \ > + asm_volatile_goto("\n" \ > + "1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\ > + _ASM_EXTABLE_UA(1b, %l[label]) \ > + : CC_OUT(z) (success), \ > + [ptr] "+m" (*_ptr), \ > + [old] "+a" (__old) \ > + : [new] ltype (__new) \ > + : "memory", "cc" \ IIUC, the "cc" clobber is unnecessary as CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y implies __GCC_ASM_FLAG_OUTPUTS__=y, i.e. CC_OUT() will resolve to "=@cc". > + : label); \ > + if (unlikely(!success)) \ > + *_old = __old; \ > + likely(success); }) > + > #else // !CONFIG_CC_HAS_ASM_GOTO_OUTPUT ... > +extern void __try_cmpxchg_user_wrong_size(void); > + > +#define unsafe_try_cmpxchg_user(_ptr, _oldp, _nval, _label) ({ \ > + __typeof__(*(_ptr)) __ret; \ This should probably be a bool, the return from the lower level helpers is a bool that's true if the exchange succeed. Declaring the type of the target implies that they return the raw result, which is confusing. > + switch (sizeof(__ret)) { \ > + case 1: __ret = __try_cmpxchg_user_asm("b", "q", \ > + (_ptr), (_oldp), \ > + (_nval), _label); \ > + break; \ > + case 2: __ret = __try_cmpxchg_user_asm("w", "r", \ > + (_ptr), (_oldp), \ > + (_nval), _label); \ > + break; \ > + case 4: __ret = __try_cmpxchg_user_asm("l", "r", \ > + (_ptr), (_oldp), \ > + (_nval), _label); \ > + break; \ > + case 8: __ret = __try_cmpxchg_user_asm("q", "r", \ > + (_ptr), (_oldp), \ > + (_nval), _label); \ Doh, I should have specified that KVM needs 8-byte CMPXCHG on 32-bit kernels due to using it to atomically update guest PAE PTEs and LTR descriptors (yay). Also, KVM's use case isn't a tight loop, how gross would it be to add a slightly less unsafe version that does __uaccess_begin_nospec()? KVM pre-checks the address way ahead of time, so the access_ok() check can be omitted. Alternatively, KVM could add its own macro, but that seems a little silly. E.g. somethign like this, though I don't think this is correct (something is getting inverted somewhere and the assembly output is a nightmare): /* "Returns" 0 on success, 1 on failure, -EFAULT if the access faults. */ #define ___try_cmpxchg_user(_ptr, _oldp, _nval, _label) ({ \ int ____ret = -EFAULT; \ __uaccess_begin_nospec(); \ ____ret = !unsafe_try_cmpxchg_user(_ptr, _oldp, _nval, _label); \ _label: \ __uaccess_end(); \ ____ret; \ }) Lastly, assuming I get my crap working, mind if I post a variant (Cc'd to stable@) in the context of KVM series? Turns out KVM has an ugly bug where it completely botches the pfn calculation of memory it remaps and accesses[*], the easiest fix is to switch to __try_cmpxchg_user() and purge the nastiness. [*] https://lore.kernel.org/all/20220124172633.103323-1-tadeusz.struk@xxxxxxxxxx