On Mon, Sep 9, 2024 at 6:42 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Mon, Sep 09, 2024, James Houghton wrote: > > I take back what I said about this working on x86. I think it's > > possible for there to be a race. > > > > Say... > > > > 1. T1 modifies pte_list_desc then unlocks kvm_rmap_unlock(). > > 2. T2 then locks kvm_rmap_lock_readonly(). > > > > The modifications that T1 has made are not guaranteed to be visible to > > T2 unless T1 has an smp_wmb() (or equivalent) after the modfication > > and T2 has an smp_rmb() before reading the data. > > > > Now the way you had it, T2, because it uses try_cmpxchg() with full > > ordering, will effectively do a smp_rmb(). But T1 only does an > > smp_wmb() *after dropping the mmu_lock*, so there is a race. While T1 > > still holds the mmu_lock but after releasing the kvm_rmap_lock(), T2 > > may enter its critical section and then *later* observe the changes > > that T1 made. > > > > Now this is impossible on x86 (IIUC) if, in the compiled list of > > instructions, T1's writes occur in the same order that we have written > > them in C. I'm not sure if WRITE_ONCE guarantees that this reordering > > at compile time is forbidden. > > > > So what I'm saying is: > > > > 1. kvm_rmap_unlock() must have an smp_wmb(). > > No, because beating a dead horse, this is not generic code, this is x86. What prevents the compiler from reordering (non-atomic, non-volatile) stores that happen before WRITE_ONCE() in kvm_rmap_unlock() to after the WRITE_ONCE()? IMV, such a reordering is currently permitted[1] (i.e., a barrier() is missing), and should the compiler choose to do this, the lock will not function correctly. > If kvm_rmap_head.val were an int, i.e. could be unionized with an atomic_t, then > I wouldn't be opposed to doing this in the locking code to document things: > > s/READ_ONCE/atomic_read_acquire > s/WRITE_ONCE/atomic_set_release > s/try_cmpxchg/atomic_cmpxchg_acquire I think we can use atomic_long_t. It would be really great if we did a substitution like this. That would address my above concern about barrier() (atomic_set_release, for example, implies a barrier() that we otherwise need to include). [1]: https://www.kernel.org/doc/Documentation/memory-barriers.txt (GUARANTEES + COMPILER BARRIER)