Re: [PATCH 5/8] KVM: x86/mmu: Also record spteps in shadow_page_walk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 14, 2021, David Matlack wrote:
> On Mon, Jun 14, 2021 at 10:59:14PM +0000, Sean Christopherson wrote:
> > The two use cases (and the only common use cases I can see) have fairly different
> > requirements.  The MMIO check wants the SPTEs at _all_ levels, whereas the fast
> > page fault handler wants the SPTE _and_ its pointer at a single level.  So I
> > wonder if by providing a super generic API we'd actually increase complexity.
> > 
> > I.e. rather than provide a completely generic API, maybe it would be better to
> > have two distinct API.  That wouldn't fix the tdp_ptep_t issue, but it would at
> > least bound it to some degree and make the code more obvious.
> 
> Does the tdp_ptep_t issue go away if kvm_tdp_mmu_get_spte_lockless
> returns an rcu_dereference'd version of the pointer? See below.

Sort of?

> > u64 *kvm_tdp_mmu_get_spte_lockless(struct kvm_vcpu *vcpu, u64 addr, u64 *spte)
> > {
> > 	struct kvm_mmu *mmu = vcpu->arch.mmu;
> > 	gfn_t gfn = addr >> PAGE_SHIFT;
> > 	struct tdp_iter iter;
> > 	u64 *sptep = NULL;
> > 
> > 	*spte = 0ull;
> > 
> > 	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
> > 		/*
> > 		 * Here be a comment about the unfortunate differences between
> > 		 * the TDP MMU and the legacy MMU.
> > 		 */
> > 		sptep = (u64 * __force)iter.sptep;
> 
> Instead, should this be:
> 
> 		sptep = rcu_dereference(iter.sptep);
> 
> ?

It's not wrong per se, but it's cheating in some sense.

The problem is that it's not the pointer itself that's RCU-protected, rather it's
what it's pointing at (the page tables) that's RCU-protected.  E.g. if this were
reading the _value_, it would be a-ok to do:

	spte = READ_ONCE(*rcu_dereference(iter.sptep));

and return the value because the caller gets a copy of the RCU-protected data.

Reading multiple times from a single dereference is also ok, e.g. see
kvm_bitmap_or_dest_vcpus() and several other APIC helpers, but note how they all
take and release the RCU lock in a single block and don't return the protected
pointer to the caller.

Stripping the __rcu annotation without actually being able to guarantee that
the caller is going to do the right thing with the pointer is why I say it's
"cheating".  E.g. if the caller were to do:

  u64 get_spte_lockess_broken(...)
  {
	u64 *sptep;

	rcu_read_lock();
	sptep = kvm_tdp_mmu_get_spte_lockless(...);
	rcu_read_unlock();

	return READ_ONCE(*sptep);
  }

it would violate RCU but Sparse would be none the wiser.

The __force ugliness is also cheating, but it's also loudly stating that we're
intentionally cheating, hence the request for a comment.

> > 		*spte = iter.old_spte;
> > 	}
> > 	return sptep;
> > }
> > 



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux