On Tue, Sep 10, 2024, James Houghton wrote: > On Mon, Sep 9, 2024 at 6:42 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > On Mon, Sep 09, 2024, James Houghton wrote: > > > I take back what I said about this working on x86. I think it's > > > possible for there to be a race. > > > > > > Say... > > > > > > 1. T1 modifies pte_list_desc then unlocks kvm_rmap_unlock(). > > > 2. T2 then locks kvm_rmap_lock_readonly(). > > > > > > The modifications that T1 has made are not guaranteed to be visible to > > > T2 unless T1 has an smp_wmb() (or equivalent) after the modfication > > > and T2 has an smp_rmb() before reading the data. > > > > > > Now the way you had it, T2, because it uses try_cmpxchg() with full > > > ordering, will effectively do a smp_rmb(). But T1 only does an > > > smp_wmb() *after dropping the mmu_lock*, so there is a race. While T1 > > > still holds the mmu_lock but after releasing the kvm_rmap_lock(), T2 > > > may enter its critical section and then *later* observe the changes > > > that T1 made. > > > > > > Now this is impossible on x86 (IIUC) if, in the compiled list of > > > instructions, T1's writes occur in the same order that we have written > > > them in C. I'm not sure if WRITE_ONCE guarantees that this reordering > > > at compile time is forbidden. > > > > > > So what I'm saying is: > > > > > > 1. kvm_rmap_unlock() must have an smp_wmb(). > > > > No, because beating a dead horse, this is not generic code, this is x86. > > What prevents the compiler from reordering (non-atomic, non-volatile) > stores that happen before WRITE_ONCE() in kvm_rmap_unlock() to after > the WRITE_ONCE()? Oof, right, nothing. Which is why __smp_store_release() has an explicit barrier() before its WRITE_ONCE(). > IMV, such a reordering is currently permitted[1] (i.e., a barrier() is > missing), and should the compiler choose to do this, the lock will not > function correctly. > > > If kvm_rmap_head.val were an int, i.e. could be unionized with an atomic_t, then > > I wouldn't be opposed to doing this in the locking code to document things: > > > > s/READ_ONCE/atomic_read_acquire > > s/WRITE_ONCE/atomic_set_release > > s/try_cmpxchg/atomic_cmpxchg_acquire > > I think we can use atomic_long_t. Duh. That's a /facepalm moment. > It would be really great if we did a substitution like this. That > would address my above concern about barrier() (atomic_set_release, > for example, implies a barrier() that we otherwise need to include). Ya, agreed, not just for warm fuzzies, but because it's necessary to prevent the compiler from being clever.