On Thu, Feb 15, 2024, Oliver Upton wrote: > On Thu, Feb 15, 2024 at 01:33:48PM -0800, Sean Christopherson wrote: > > [...] > > > +/* TODO: Expand this madness to also support u8, u16, and u32 operands. */ > > +#define vcpu_arch_put_guest(mem, val, rand) \ > > +do { \ > > + if (!is_forced_emulation_enabled || !(rand & 1)) { \ > > + *mem = val; \ > > + } else if (rand & 2) { \ > > + __asm__ __volatile__(KVM_FEP "movq %1, %0" \ > > + : "+m" (*mem) \ > > + : "r" (val) : "memory"); \ > > + } else { \ > > + uint64_t __old = READ_ONCE(*mem); \ > > + \ > > + __asm__ __volatile__(KVM_FEP LOCK_PREFIX "cmpxchgq %[new], %[ptr]" \ > > + : [ptr] "+m" (*mem), [old] "+a" (__old) \ > > + : [new]"r" (val) : "memory", "cc"); \ > > + } \ > > +} while (0) > > + > > Last bit of bikeshedding then I'll go... Can you just use a C function > and #define it so you can still do ifdeffery to slam in a default > implementation? Yes, but the macro shenanigans aren't to create a default, they're to set the stage for expanding to other sizes without having to do: vcpu_arch_put_guest{8,16,32,64}() or if we like bytes instead of bits: vcpu_arch_put_guest{1,2,4,8}() I'm not completely against that approach; it's not _that_ much copy+paste boilerplate, but it's enough that I think that macros would be a clear win, especially if we want to expand what instructions are used. <me fiddles around> Actually, I take that back, I am against that approach :-) I was expecting to have to do some switch() explosion to get the CMPXCHG stuff working, but I'm pretty sure the mess that is the kernel's unsafe_try_cmpxchg_user() and __put_user_size() is is almost entirely due to needing to support 32-bit kernels, or maybe some need to strictly control the asm constraints. For selftests, AFAICT the below Just Works on gcc and clang for legal sizes. And as a bonus, we can sanity check that the pointer and value are of the same size. Which we definitely should do, otherwise the compiler has a nasty habit of using the size of the value of the right hand side for the asm blobs, e.g. this vcpu_arch_put_guest((u8 *)addr, (u32)val, rand); generates 32-bit accesses. Oof. #define vcpu_arch_put_guest(mem, val, rand) \ do { \ kvm_static_assert(sizeof(*mem) == sizeof(val)); \ if (!is_forced_emulation_enabled || !(rand & 1)) { \ *mem = val; \ } else if (rand & 2) { \ __asm__ __volatile__(KVM_FEP "mov %1, %0" \ : "+m" (*mem) \ : "r" (val) : "memory"); \ } else { \ uint64_t __old = READ_ONCE(*mem); \ \ __asm__ __volatile__(LOCK_PREFIX "cmpxchg %[new], %[ptr]" \ : [ptr] "+m" (*mem), [old] "+a" (__old) \ : [new]"r" (val) : "memory", "cc"); \ } \ } while (0)