On Fri, Jun 11, 2021, David Matlack wrote: > In order to use walk_shadow_page_lockless() in fast_page_fault() we need > to also record the spteps. > > No functional change intended. > > Signed-off-by: David Matlack <dmatlack@xxxxxxxxxx> > --- > arch/x86/kvm/mmu/mmu.c | 1 + > arch/x86/kvm/mmu/mmu_internal.h | 3 +++ > arch/x86/kvm/mmu/tdp_mmu.c | 1 + > 3 files changed, 5 insertions(+) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 8140c262f4d3..765f5b01768d 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -3538,6 +3538,7 @@ static bool walk_shadow_page_lockless(struct kvm_vcpu *vcpu, u64 addr, > spte = mmu_spte_get_lockless(it.sptep); > walk->last_level = it.level; > walk->sptes[it.level] = spte; > + walk->spteps[it.level] = it.sptep; > > if (!is_shadow_present_pte(spte)) > break; > diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h > index 26da6ca30fbf..0fefbd5d6c95 100644 > --- a/arch/x86/kvm/mmu/mmu_internal.h > +++ b/arch/x86/kvm/mmu/mmu_internal.h > @@ -178,6 +178,9 @@ struct shadow_page_walk { > > /* The spte value at each level. */ > u64 sptes[PT64_ROOT_MAX_LEVEL + 1]; > + > + /* The spte pointers at each level. */ > + u64 *spteps[PT64_ROOT_MAX_LEVEL + 1]; Hrm. I'm not sure how I feel about this patch, or about shadow_page_walk in general. On the one hand, I like having a common API. On the other hand, I really don't like mixing two different protection schemes, e.g. this really should be tdp_ptep_t spteps; in order to gain the RCU annotations for TDP, but those RCU annotations are problematic because the legacy MMU doesn't use RCU to protect its page tables. I also don't like forcing the caller to hold the "lock" for longer than is necessary, e.g. get_mmio_spte() doesn't require access to the page tables after the initial walk, but the common spteps[] kinda sorta forces its hand. The two use cases (and the only common use cases I can see) have fairly different requirements. The MMIO check wants the SPTEs at _all_ levels, whereas the fast page fault handler wants the SPTE _and_ its pointer at a single level. So I wonder if by providing a super generic API we'd actually increase complexity. I.e. rather than provide a completely generic API, maybe it would be better to have two distinct API. That wouldn't fix the tdp_ptep_t issue, but it would at least bound it to some degree and make the code more obvious. I suspect it would also reduce the code churn, though that's not necessarily a goal in and of itself. E.g. for fast_page_fault(): walk_shadow_page_lockless_begin(vcpu); do { sptep = get_spte_lockless(..., &spte); if (!is_shadow_present_pte(spte)) break; sp = sptep_to_sp(sptep); if (!is_last_spte(spte, sp->role.level)) break; ... } while(true); walk_shadow_page_lockless_end(vcpu); and for get_mmio_spte(): walk_shadow_page_lockless_begin(vcpu); leaf = get_sptes_lockless(vcpu, addr, sptes, &root); if (unlikely(leaf < 0)) { *sptep = 0ull; return reserved; } walk_shadow_page_lockless_end(vcpu); And if we look at the TDP MMU implementations, they aren't sharing _that_ much code, and the code that is shared isn't all that interesting, e.g. if we really wanted to we could macro-magic away the boilerplate, but I think even I would balk at the result :-) int kvm_tdp_mmu_get_sptes_lockless(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level) { struct tdp_iter iter; struct kvm_mmu *mmu = vcpu->arch.mmu; gfn_t gfn = addr >> PAGE_SHIFT; int leaf = -1; *root_level = vcpu->arch.mmu->shadow_root_level; tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) { leaf = iter.level; sptes[leaf] = iter.old_spte; } return leaf; } u64 *kvm_tdp_mmu_get_spte_lockless(struct kvm_vcpu *vcpu, u64 addr, u64 *spte) { struct kvm_mmu *mmu = vcpu->arch.mmu; gfn_t gfn = addr >> PAGE_SHIFT; struct tdp_iter iter; u64 *sptep = NULL; *spte = 0ull; tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) { /* * Here be a comment about the unfortunate differences between * the TDP MMU and the legacy MMU. */ sptep = (u64 * __force)iter.sptep; *spte = iter.old_spte; } return sptep; }