On Tue, Aug 09, 2022, Paolo Bonzini wrote: > On 8/9/22 16:44, Sean Christopherson wrote: > > On Tue, Aug 09, 2022, Paolo Bonzini wrote: > > > and (4) is definitely ordered after (1) thanks to the READ_ONCE hidden > > > within (3) and the data dependency from old_spte to sp. > > > > Yes, I think that's correct. Callers must verify the SPTE is present before getting > > the associated child shadow page. KVM does have instances where a shadow page is > > retrieved from the SPTE _pointer_, but that's the parent shadow page, i.e. isn't > > guarded by the SPTE being present. > > > > struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(iter->sptep)); > > > > Something like this is as a separate patch? > > Would you resubmit without the memory barriers then? Ya. > > diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h > > index f0af385c56e0..9d982ccf4567 100644 > > --- a/arch/x86/kvm/mmu/tdp_iter.h > > +++ b/arch/x86/kvm/mmu/tdp_iter.h > > @@ -13,6 +13,12 @@ > > * to be zapped while holding mmu_lock for read, and to allow TLB flushes to be > > * batched without having to collect the list of zapped SPs. Flows that can > > * remove SPs must service pending TLB flushes prior to dropping RCU protection. > > + * > > + * The READ_ONCE() ensures that, if the SPTE points at a child shadow page, all > > + * fields in struct kvm_mmu_page will be read after the caller observes the > > + * present SPTE (KVM must check that the SPTE is present before following the > > + * SPTE's pfn to its associated shadow page). Pairs with the implicit memory > > I guess you mean both the shadow page table itself and the struct > kvm_mmu_page? Yes. It's a bug if KVM ever consumes a SPTE's pfn (read the next level of SPTEs or pfn_to_page() to get kvm_mmu_page) without first checking that the SPTE is MMU-present. > Or do you think to_shadow_page() should have a smp_rmb()? I believe we're ok. If any SP fields are toggled after the SP is marked present, then we'd need separate barriers, e.g. similar to is_unsync_root() + mmu_try_to_unsync_pages() (ignoring that we've since added a spinlock in the "try to unsync" path).