On Mon, Jun 14, 2021, David Matlack wrote: > On Mon, Jun 14, 2021 at 10:59:14PM +0000, Sean Christopherson wrote: > > The two use cases (and the only common use cases I can see) have fairly different > > requirements. The MMIO check wants the SPTEs at _all_ levels, whereas the fast > > page fault handler wants the SPTE _and_ its pointer at a single level. So I > > wonder if by providing a super generic API we'd actually increase complexity. > > > > I.e. rather than provide a completely generic API, maybe it would be better to > > have two distinct API. That wouldn't fix the tdp_ptep_t issue, but it would at > > least bound it to some degree and make the code more obvious. > > Does the tdp_ptep_t issue go away if kvm_tdp_mmu_get_spte_lockless > returns an rcu_dereference'd version of the pointer? See below. Sort of? > > u64 *kvm_tdp_mmu_get_spte_lockless(struct kvm_vcpu *vcpu, u64 addr, u64 *spte) > > { > > struct kvm_mmu *mmu = vcpu->arch.mmu; > > gfn_t gfn = addr >> PAGE_SHIFT; > > struct tdp_iter iter; > > u64 *sptep = NULL; > > > > *spte = 0ull; > > > > tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) { > > /* > > * Here be a comment about the unfortunate differences between > > * the TDP MMU and the legacy MMU. > > */ > > sptep = (u64 * __force)iter.sptep; > > Instead, should this be: > > sptep = rcu_dereference(iter.sptep); > > ? It's not wrong per se, but it's cheating in some sense. The problem is that it's not the pointer itself that's RCU-protected, rather it's what it's pointing at (the page tables) that's RCU-protected. E.g. if this were reading the _value_, it would be a-ok to do: spte = READ_ONCE(*rcu_dereference(iter.sptep)); and return the value because the caller gets a copy of the RCU-protected data. Reading multiple times from a single dereference is also ok, e.g. see kvm_bitmap_or_dest_vcpus() and several other APIC helpers, but note how they all take and release the RCU lock in a single block and don't return the protected pointer to the caller. Stripping the __rcu annotation without actually being able to guarantee that the caller is going to do the right thing with the pointer is why I say it's "cheating". E.g. if the caller were to do: u64 get_spte_lockess_broken(...) { u64 *sptep; rcu_read_lock(); sptep = kvm_tdp_mmu_get_spte_lockless(...); rcu_read_unlock(); return READ_ONCE(*sptep); } it would violate RCU but Sparse would be none the wiser. The __force ugliness is also cheating, but it's also loudly stating that we're intentionally cheating, hence the request for a comment. > > *spte = iter.old_spte; > > } > > return sptep; > > } > >