Re: [RFC PATCH 12/13] KVM: nSVM: Service local TLB flushes before nested transitions

Maxim Levitsky <mlevitsk@xxxxxxxxxx> · Tue, 04 Mar 2025 22:03:51 -0500

On Mon, 2025-03-03 at 22:18 +0000, Yosry Ahmed wrote:
> On Fri, Feb 28, 2025 at 09:20:18PM -0500, Maxim Levitsky wrote:
> > On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote:
> > > KVM does not track TLB flush requests for L1 vs. L2. Hence, service
> > > local flush that target the current context before switching to a new
> > > one. Since ASIDs are tracked per-VMCB, service the flushes before every
> > > VMCB switch.
> > > 
> > > This is conceptually similar to how nVMX calls
> > > kvm_service_local_tlb_flush_requests() in
> > > nested_vmx_enter_non_root_mode() and nested_vmx_vmexit(), with the
> > > following differences:
> > > 
> > > 1. nVMX tracks the current VPID based on is_guest_mode(), so local TLB
> > >    flushes are serviced before enter_guest_mode() and
> > >    leave_guest_mode(). On the other hand, nSVM tracks the current ASID
> > >    based on the current VMCB, so the TLB flushes are serviced before an
> > >    VMCB switch.
> > > 
> > > 2. nVMX only enters and leaves guest mode in
> > >    nested_vmx_enter_non_root_mode() and nested_vmx_vmexit(). Other paths
> > >    like vmx_set_nested_state() and vmx_leave_nested() call into these
> > >    two functions. On the other hand, nSVM open codes the switch in
> > >    functions like svm_set_nested_state() and svm_leave_nested(), so
> > >    servicing the flush in svm_switch_svm() is probably most reliable.
> > > 
> > > Signed-off-by: Yosry Ahmed <yosry.ahmed@xxxxxxxxx>
> > > ---
> > >  arch/x86/kvm/svm/svm.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index 5e7b1c9bfa605..6daa7efa9262b 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -1421,6 +1421,12 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > >  
> > >  void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb)
> > >  {
> > > +	/*
> > > +	 * ASIDs are tracked per-VMCB. Perform any pending TLB flushes for the
> > > +	 * current VMCB before switching to a new one.
> > > +	 */
> > > +	kvm_service_local_tlb_flush_requests(&svm->vcpu);
> > > +
> > >  	svm->current_vmcb = target_vmcb;
> > >  	svm->vmcb = target_vmcb->ptr;
> > >  }
> > 
> > Note that another difference between SVM and VMX is that this code will only set tlb_ctl
> > in the current vmcb, the actual flush can happen much later, when we do VM entry with this vmcb,
> > e.g if we are now in L2, the flush will happen when we enter L2 again.
> 
> Right, but I think the internal implementation of the TLB flushes is not
> relevant in this specific instance. Do you think it would be useful to
> mention that here?

I am not sure to be honest, I just mentioned this because in theory there can be a difference,
in regard to the fact that we might think that we flushed the TLB while in fact we haven't yet.

I am trying my best to think about what hidden problems might lurk around and surface later.

Not directly related to the above, but I am thinking:
I really like the way SVM flush works because it ensures that redundant flushes don't cost anything.

I wonder if we can make VMX code emulate this,
by having an emulated 'tlb_control' field and then doing the flush (INVEPT) on VM entry.

Best regards,
	Maxim Levitsky

> 
> If we were to document the difference in TLB flush handling between VMX
> and SVM I think a better place would be at kvm_vcpu_flush_tlb_*(), or
> maybe in kvm_host.h where the vendor callbacks are defined? Not sure.
> 
> > I think that this is correct but I might be mistaken.
> > 
> > Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
> 
> Thanks!
>