Re: [PATCH 19/22] KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu_lock

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 11 Sep 2024 08:33:53 -0700

On Tue, Sep 10, 2024, James Houghton wrote:
> On Mon, Sep 9, 2024 at 6:42 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > On Mon, Sep 09, 2024, James Houghton wrote:
> > > I take back what I said about this working on x86. I think it's
> > > possible for there to be a race.
> > >
> > > Say...
> > >
> > > 1. T1 modifies pte_list_desc then unlocks kvm_rmap_unlock().
> > > 2. T2 then locks kvm_rmap_lock_readonly().
> > >
> > > The modifications that T1 has made are not guaranteed to be visible to
> > > T2 unless T1 has an smp_wmb() (or equivalent) after the modfication
> > > and T2 has an smp_rmb() before reading the data.
> > >
> > > Now the way you had it, T2, because it uses try_cmpxchg() with full
> > > ordering, will effectively do a smp_rmb(). But T1 only does an
> > > smp_wmb() *after dropping the mmu_lock*, so there is a race. While T1
> > > still holds the mmu_lock but after releasing the kvm_rmap_lock(), T2
> > > may enter its critical section and then *later* observe the changes
> > > that T1 made.
> > >
> > > Now this is impossible on x86 (IIUC) if, in the compiled list of
> > > instructions, T1's writes occur in the same order that we have written
> > > them in C. I'm not sure if WRITE_ONCE guarantees that this reordering
> > > at compile time is forbidden.
> > >
> > > So what I'm saying is:
> > >
> > > 1. kvm_rmap_unlock() must have an smp_wmb().
> >
> > No, because beating a dead horse, this is not generic code, this is x86.
> 
> What prevents the compiler from reordering (non-atomic, non-volatile)
> stores that happen before WRITE_ONCE() in kvm_rmap_unlock() to after
> the WRITE_ONCE()?

Oof, right, nothing.  Which is why __smp_store_release() has an explicit
barrier() before its WRITE_ONCE().

> IMV, such a reordering is currently permitted[1] (i.e., a barrier() is
> missing), and should the compiler choose to do this, the lock will not
> function correctly.
> 
> > If kvm_rmap_head.val were an int, i.e. could be unionized with an atomic_t, then
> > I wouldn't be opposed to doing this in the locking code to document things:
> >
> >  s/READ_ONCE/atomic_read_acquire
> >  s/WRITE_ONCE/atomic_set_release
> >  s/try_cmpxchg/atomic_cmpxchg_acquire
> 
> I think we can use atomic_long_t.

Duh.  That's a /facepalm moment.

> It would be really great if we did a substitution like this. That
> would address my above concern about barrier() (atomic_set_release,
> for example, implies a barrier() that we otherwise need to include).

Ya, agreed, not just for warm fuzzies, but because it's necessary to prevent
the compiler from being clever.