Re: [PATCH v7 040/102] KVM: x86/mmu: Zap only leaf SPTEs for deleted/moved memslot for private mmu

Isaku Yamahata <isaku.yamahata@xxxxxxxxx> · Tue, 19 Jul 2022 04:06:16 -0700

On Fri, Jul 01, 2022 at 10:41:08PM +1200,
Kai Huang <kai.huang@xxxxxxxxx> wrote:

> On Mon, 2022-06-27 at 14:53 -0700, isaku.yamahata@xxxxxxxxx wrote:
> > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > 
> > For kvm mmu that has shared bit mask, zap only leaf SPTEs when
> > deleting/moving a memslot.  The existing kvm_mmu_zap_memslot() depends on
> 
> Unless I am mistaken, I don't see there's an 'existing' kvm_mmu_zap_memslot().

Oops. it should be kvm_tdp_mmu_invalidate_all_roots().

> > role.invalid with read lock of mmu_lock so that other vcpu can operate on
> > kvm mmu concurrently. 
> > 
> 
> > Mark the root page table invalid, unlink it from page
> > table pointer of CPU, process the page table.  
> > 
> 
> Are you talking about the behaviour of existing code, or the change you are
> going to make?  Looks like you mean the latter but I believe it's the former.

The existing code.  The should "It marks ...".

> > It doesn't work for private
> > page table to unlink the root page table because it requires all SPTE entry
> > to be non-present. 
> > 
> 
> I don't think we can truly *unlink* the private root page table from secure
> EPTP, right?  The EPTP (root table) is fixed (and hidden) during TD's runtime.
> 
> I guess you are trying to say: removing/unlinking one secure-EPT page requires
> removing/unlinking all its children first? 

That's right. I'll update it as follows.

> So the reason to only zap leaf is we cannot truly unlink the private root page
> table, correct?  Sorry your changelog is not obvious to me.

The reason is, to unlink a page table from the parent's SPTE, all SPTEs of the
page table need to be non-present.

Here is the updated commit message.

KVM: x86/mmu: Zap only leaf SPTEs for deleted/moved memslot for private mmu|
For kvm mmu that has shared bit mask, zap only leaf SPTEs when             |
deleting/moving a memslot.  The existing kvm_tdp_mmu_invalidate_all_roots()|
depends on role.invalid with read lock of mmu_lock so that other vcpu can  |
operate on kvm mmu concurrently.  It marks the root page table invalid,    |
zaps SPTEs of the root page tables.                                        |
                                                                           |
It doesn't work to unlink a private page table from the parent's SPTE entry|
because it requires all SPTE entry of the page table to be non-present.    |
Instead, with write-lock of mmu_lock and zap only leaf SPTEs for kvm mmu   |
with shared bit mask.  

> > Instead, with write-lock of mmu_lock and zap only leaf
> > SPTEs for kvm mmu with shared bit mask.
> > 
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 35 ++++++++++++++++++++++++++++++++++-
> >  1 file changed, 34 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 80d7c7709af3..c517c7bca105 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5854,11 +5854,44 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
> >  	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
> >  }
> >  
> > +static void kvm_mmu_zap_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> > +{
> > +	bool flush = false;
> > +
> > +	write_lock(&kvm->mmu_lock);
> > +
> > +	/*
> > +	 * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
> > +	 * case scenario we'll have unused shadow pages lying around until they
> > +	 * are recycled due to age or when the VM is destroyed.
> > +	 */
> > +	if (is_tdp_mmu_enabled(kvm)) {
> > +		struct kvm_gfn_range range = {
> > +		      .slot = slot,
> > +		      .start = slot->base_gfn,
> > +		      .end = slot->base_gfn + slot->npages,
> > +		      .may_block = false,
> > +		};
> > +
> > +		flush = kvm_tdp_mmu_unmap_gfn_range(kvm, &range, flush);
> 
> 
> It appears you only unmap private GFNs (because the base_gfn doesn't have shared
> bit)?  I think shared mapping in this slot must be zapped too?  
>
> How is this done?  Or the kvm_tdp_mmu_unmap_gfn_range() also zaps shared
> mappings?

kvm_tdp_mmu_unmap_gfn_range() handles both private gfn and shared gfn as
they are alias.  

> It's hard to review if one patch's behaviour/logic depends on further patches.

I'll add a comment on the call.

-- 
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>