Re: [RFC PATCH 13/13] KVM: nSVM: Stop bombing the TLB on nested transitions

Maxim Levitsky <mlevitsk@xxxxxxxxxx> · Tue, 04 Mar 2025 22:14:40 -0500

On Mon, 2025-03-03 at 22:21 +0000, Yosry Ahmed wrote:
> On Fri, Feb 28, 2025 at 09:21:54PM -0500, Maxim Levitsky wrote:
> > On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote:
> > > Now that nested TLB flushes are properly tracked with a well-maintained
> > > separate ASID for L2 and proper handling of L1's TLB flush requests,
> > > drop the unconditional flushes and syncs on nested transitions.
> > > 
> > > On a Milan machine, an L1 and L2 guests were booted, both with a single
> > > vCPU, and pinned to a single physical CPU to maximize TLB collisions. In
> > > this setup, the cpuid_rate microbenchmark [1] showed the following
> > > changes with this patch:
> > > 
> > > +--------+--------+-------------------+----------------------+
> > > > L0     | L1     | cpuid_rate (base) | cpuid_rate (patched) |
> > > +========+========+===================+======================+
> > > > NPT    | NPT    | 256621            | 301113 (+17.3%)      |
> > > > NPT    | Shadow | 180017            | 203347 (+12.96%)     |
> > > > Shadow | Shadow | 177006            | 189150 (+6.86%)      |
> > > +--------+--------+-------------------+----------------------+
> > > 
> > > [1]https://lore.kernel.org/kvm/20231109180646.2963718-1-khorenko@xxxxxxxxxxxxx/
> > > 
> > > Signed-off-by: Yosry Ahmed <yosry.ahmed@xxxxxxxxx>
> > > ---
> > >  arch/x86/kvm/svm/nested.c | 7 -------
> > >  1 file changed, 7 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > > index 8e40ff21f7353..45a187d4c23d1 100644
> > > --- a/arch/x86/kvm/svm/nested.c
> > > +++ b/arch/x86/kvm/svm/nested.c
> > > @@ -512,9 +512,6 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
> > >  		svm->nested.last_asid = svm->nested.ctl.asid;
> > >  		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> > >  	}
> > > -	/* TODO: optimize unconditional TLB flush/MMU sync */
> > > -	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> > > -	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> > >  }
> > >  
> > >  static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
> > > @@ -530,10 +527,6 @@ static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
> > >  	 */
> > >  	if (svm->nested.ctl.tlb_ctl == TLB_CONTROL_FLUSH_ALL_ASID)
> > >  		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> > > -
> > > -	/* TODO: optimize unconditional TLB flush/MMU sync */
> > > -	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> > > -	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> > >  }
> > >  
> > >  /*
> > 
> > Assuming that all previous patches are correct this one should work as well.
> > 
> > However only a very heavy stress testing, including hyperv, windows guests
> > of various types, etc can give me confidence that there is no some ugly bug lurking
> > somewhere.
> 
> I tried booting an L2 and running some workloads like netperf in there.
> I also tried booting an L3.
> 
> I am planning to try and run some testing with a windows L2 guest. I am
> assuming this exercises the hyper-V emulation in L1, which could be
> interesting.
> 
> I am not sure if I will be able to test more scenarios though,
> especially Windows as an L1 (and something else as an L2).
> 
> Let me know if you have something specific in mind.

KVM can run itself 'under' HyperV (although in this case when it runs a guest
the guest will be L3 overall, so not really something supported but still something that might
reveal bugs).
In this case KVM/L1 can take advantage of L0's TLB flush interface.

Stress testing L3s also can be nice, although in this case from L0 POV, it doesn't see L3 at all.
Instead it sees that L1 runs two different L2s back to back, so the current code will
likely flush everything all the time.

The direct TLB flush that hyperv does, especially from L2 to L0 should also be tested,
it's a relatively new feature, so we need to check that L2 actually uses it.

KVM also has its own way of TLB flushing paravirtualization, which can in theory interfere with this.

It's also nice to run a hyperv enabled Windows as KVM guest, and run a guest in it (can be Windows or Linux or anything else)
Such guest will run two L2 VMs, Windows itself and the VM you run inside.

You can also try other L1s, like VirtualBox, VMware, running in Windows or Linux L1,
and themselves can run a windows or Linux L2. 

You can also test other OSes like BSD* and such as L1, they might have a different TLB access pattern and
might reveal something, who knows. These can also run L2s using their own hypervisors.

Running a very old (say Windows XP, or some very old Linux) as L2 might also reveal something.

(But don't try to run win95/98 - this OS is known to not flush TLB properly (it doesn't use INVLPG when it should),
so it doesn't work well on AMD at all because of this).

Finally, it might be worth it to develop a TLB stress test if one doesn't exist yet.

Best regards,
   Maxim Levitsky

> 
> > TLB management can be very tricky, so I can't be 100% sure that I haven't missed something.
> > 
> > Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
> 
> Thanks!
>