Re: [PATCH V3 09/12] KVM: X86/MMU: Move the verifying of NPT's PDPTE in FNAME(fetch)

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 19 Jul 2022 23:21:33 +0000

On Sat, May 21, 2022, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@xxxxxxxxxxxx>
> 
> FNAME(page_fault) verifies PDPTE for nested NPT in PAE paging mode
> because nested_svm_get_tdp_pdptr() reads the guest NPT's PDPTE from
> memory unconditionally for each call.
> 
> The verifying is complicated and it works only when mmu->pae_root
> is always used when the guest is PAE paging.

Why is this relevant?  It's not _that_ complicated, and even if it were, I don't
see how calling that out helps the reader understand the motivation for this patch.

> Move the verifying code in FNAME(fetch) and simplify it since the local
> shadow page is used and it can be walked in FNAME(fetch) and unlinked
> from children via drop_spte().
> 
> It also allows for mmu->pae_root NOT to be used when it is NOT required

Avoid leading with pronous, "it" is ambiguous, e.g. at first I thought "it' meant
moving the code, but what "it" really means is using the iterator from the shadow
page walk instead of hardcoding a pae_root lookup.

And changing from pae_root to it.sptep needs to be explicitly called out.  It's
a subtle but important detail.  And if you call that out, then it's more obvious
why this patch is relevant to not having to use pae_root for a 64-bit host with NPT.

> to be put in a 32bit CR3.
> 
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@xxxxxxxxxxxx>
> ---
>  arch/x86/kvm/mmu/paging_tmpl.h | 72 ++++++++++++++++------------------
>  1 file changed, 33 insertions(+), 39 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index cd6032e1947c..67c419bce1e5 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -659,6 +659,39 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>  		clear_sp_write_flooding_count(it.sptep);
>  		drop_large_spte(vcpu, it.sptep);
>  
> +		/*
> +		 * When nested NPT enabled and L1 is PAE paging,
> +		 * mmu->get_pdptrs() which is nested_svm_get_tdp_pdptr() reads
> +		 * the guest NPT's PDPTE from memory unconditionally for each
> +		 * call.
> +		 *
> +		 * The guest PAE root page is not write-protected.
> +		 *
> +		 * The mmu->get_pdptrs() in FNAME(walk_addr_generic) might get
> +		 * a value different from previous calls or different from the
> +		 * return value of mmu->get_pdptrs() in mmu_alloc_shadow_roots().
> +		 *
> +		 * It will cause the following code installs the spte in a wrong
> +		 * sp or links a sp to a wrong parent if the return value of
> +		 * mmu->get_pdptrs() is not verified unchanged since
> +		 * FNAME(gpte_changed) can't check this kind of change.
> +		 *
> +		 * Verify the return value of mmu->get_pdptrs() (only the gfn
> +		 * in it needs to be checked) and drop the spte if the gfn isn't
> +		 * matched.
> +		 *
> +		 * Do the verifying unconditionally when the guest is PAE
> +		 * paging no matter whether it is nested NPT or not to avoid
> +		 * complicated code.
> +		 */
> +		if (vcpu->arch.mmu->cpu_role.base.level == PT32E_ROOT_LEVEL &&
> +		    it.level == PT32E_ROOT_LEVEL &&
> +		    is_shadow_present_pte(*it.sptep)) {
> +			sp = to_shadow_page(*it.sptep & PT64_BASE_ADDR_MASK);

For this patch, it's probably worth a

			WARN_ON_ONCE(sp->spt != vcpu->arch.mmu->pae_root);

Mostly so that when the future patch stops using pae_root for 64-bit NPT hosts,
there's a code change for this particular logic that is very much relevant to
that change.

> +			if (gw->table_gfn[it.level - 2] != sp->gfn)
> +				drop_spte(vcpu->kvm, it.sptep);
> +		}