On Tue, Sep 3, 2024 at 2:27 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Tue, Sep 03, 2024, James Houghton wrote: > > On Fri, Aug 9, 2024 at 12:44 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > +/* > > > + * rmaps and PTE lists are mostly protected by mmu_lock (the shadow MMU always > > > + * operates with mmu_lock held for write), but rmaps can be walked without > > > + * holding mmu_lock so long as the caller can tolerate SPTEs in the rmap chain > > > + * being zapped/dropped _while the rmap is locked_. > > > + * > > > + * Other than the KVM_RMAP_LOCKED flag, modifications to rmap entries must be > > > + * done while holding mmu_lock for write. This allows a task walking rmaps > > > + * without holding mmu_lock to concurrently walk the same entries as a task > > > + * that is holding mmu_lock but _not_ the rmap lock. Neither task will modify > > > + * the rmaps, thus the walks are stable. > > > + * > > > + * As alluded to above, SPTEs in rmaps are _not_ protected by KVM_RMAP_LOCKED, > > > + * only the rmap chains themselves are protected. E.g. holding an rmap's lock > > > + * ensures all "struct pte_list_desc" fields are stable. > > > + */ > > > +#define KVM_RMAP_LOCKED BIT(1) > > > + > > > +static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head) > > > +{ > > > + unsigned long old_val, new_val; > > > + > > > + old_val = READ_ONCE(rmap_head->val); > > > + if (!old_val) > > > + return 0; > > > > I'm having trouble understanding how this bit works. What exactly is > > stopping the rmap from being populated while we have it "locked"? > > Nothing prevents the 0=>1 transition, but that's a-ok because walking rmaps for > aging only cares about existing mappings. The key to correctness is that this > helper returns '0' when there are no rmaps, i.e. the caller is guaranteed to do > nothing and thus will never see any rmaps that come along in the future. > > > aren't holding the MMU lock at all in the lockless case, and given > > this bit, it is impossible (I think?) for the MMU-write-lock-holding, > > rmap-modifying side to tell that this rmap is locked. > > > > Concretely, my immediate concern is that we will still unconditionally > > write 0 back at unlock time even if the value has changed. > > The "readonly" unlocker (not added in this patch) is a nop if the rmap was empty, > i.e. wasn't actually locked. Ah, that's what I was missing. Thanks! This all makes sense now. > > +static void kvm_rmap_unlock_readonly(struct kvm_rmap_head *rmap_head, > + unsigned long old_val) > +{ > + if (!old_val) > + return; > + > + KVM_MMU_WARN_ON(old_val != (rmap_head->val & ~KVM_RMAP_LOCKED)); > + WRITE_ONCE(rmap_head->val, old_val); > +} > > The TODO in kvm_rmap_lock() pretty much sums things up: it's safe to call the > "normal", non-readonly versions if and only if mmu_lock is held for write. > > +static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head) > +{ > + /* > + * TODO: Plumb in @kvm and add a lockdep assertion that mmu_lock is > + * held for write. > + */ > + return __kvm_rmap_lock(rmap_head); > +}