Re: [PATCH v19 085/130] KVM: TDX: Complete interrupts after tdexit

Yuan Yao <yuan.yao@xxxxxxxxxxxxxxx> · Tue, 18 Jun 2024 11:28:34 +0800

On Mon, Jun 17, 2024 at 05:07:56PM +0800, Binbin Wu wrote:
>
>
> On 6/17/2024 4:07 PM, Yuan Yao wrote:
> > On Mon, Feb 26, 2024 at 12:26:27AM -0800, isaku.yamahata@xxxxxxxxx wrote:
> > > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> > >
> > > This corresponds to VMX __vmx_complete_interrupts().  Because TDX
> > > virtualize vAPIC, KVM only needs to care NMI injection.
> > >
> > > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> > > Reviewed-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > > Reviewed-by: Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx>
> > > ---
> > > v19:
> > > - move tdvps_management_check() to this patch
> > > - typo: complete -> Complete in short log
> > > ---
> > >   arch/x86/kvm/vmx/tdx.c | 10 ++++++++++
> > >   arch/x86/kvm/vmx/tdx.h |  4 ++++
> > >   2 files changed, 14 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > > index 83dcaf5b6fbd..b8b168f74dfe 100644
> > > --- a/arch/x86/kvm/vmx/tdx.c
> > > +++ b/arch/x86/kvm/vmx/tdx.c
> > > @@ -535,6 +535,14 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > >   	 */
> > >   }
> > >
> > > +static void tdx_complete_interrupts(struct kvm_vcpu *vcpu)
> > > +{
> > > +	/* Avoid costly SEAMCALL if no nmi was injected */
> > > +	if (vcpu->arch.nmi_injected)
> > > +		vcpu->arch.nmi_injected = td_management_read8(to_tdx(vcpu),
> > > +							      TD_VCPU_PEND_NMI);
> > > +}
> > Looks this leads to NMI injection delay or even won't be
> > reinjected if KVM_REQ_EVENT is not set on the target cpu
> > when more than 1 NMIs are pending there.
> >
> > On normal VM, KVM uses NMI window vmexit for injection
> > successful case to rasie the KVM_REQ_EVENT again for remain
> > pending NMIs, see handle_nmi_window(). KVM also checks
> > vectoring info after VMEXIT for case that the NMI is not
> > injected successfully in this vmentry vmexit round, and
> > raise KVM_REQ_EVENT to try again, see __vmx_complete_interrupts().
> >
> > In TDX, consider there's no way to get vectoring info or
> > handle nmi window vmexit, below checking should cover both
> > scenarios for NMI injection:
> >
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index e9c9a185bb7b..9edf446acd3b 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -835,9 +835,12 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> >   static void tdx_complete_interrupts(struct kvm_vcpu *vcpu)
> >   {
> >          /* Avoid costly SEAMCALL if no nmi was injected */
> > -       if (vcpu->arch.nmi_injected)
> > +       if (vcpu->arch.nmi_injected) {
> >                  vcpu->arch.nmi_injected = td_management_read8(to_tdx(vcpu),
> >                                                                TD_VCPU_PEND_NMI);
> > +               if (vcpu->arch.nmi_injected || vcpu->arch.nmi_pending)
> > +                       kvm_make_request(KVM_REQ_EVENT, vcpu);
>
> For nmi_injected, it should be OK because TD_VCPU_PEND_NMI is still set.
> But for nmi_pending, it should be checked and raise event.

Right, I just forgot the tdx module can do more than "hardware":

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index e9c9a185bb7b..3530a4882efc 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -835,9 +835,16 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 static void tdx_complete_interrupts(struct kvm_vcpu *vcpu)
 {
        /* Avoid costly SEAMCALL if no nmi was injected */
-       if (vcpu->arch.nmi_injected)
+       if (vcpu->arch.nmi_injected) {
                vcpu->arch.nmi_injected = td_management_read8(to_tdx(vcpu),
                                                              TD_VCPU_PEND_NMI);
+               /*
+                  tdx module will retry injection in case of TD_VCPU_PEND_NMI,
+                  so don't need to set KVM_REQ_EVENT for it again.
+                */
+               if (!vcpu->arch.nmi_injected && vcpu->arch.nmi_pending)
+                       kvm_make_request(KVM_REQ_EVENT, vcpu);
+       }
 }

>
> I remember there was a discussion in the following link:
> https://lore.kernel.org/kvm/20240402065254.GY2444378@xxxxxxxxxxxxxxxxxxxxx/
> It said  tdx_vcpu_run() will ignore force_immediate_exit.
> If force_immediate_exit is igored for TDX, then the nmi_pending handling
> could still be delayed if the previous NMI was injected successfully.

Yes, not sure the possibility of meeting this in real use
case, I know it happens in some testing, e.g. the kvm
unit test's multiple NMI tesing.

>
>
> > +       }
> >   }
> >
> > > +
> > >   struct tdx_uret_msr {
> > >   	u32 msr;
> > >   	unsigned int slot;
> > > @@ -663,6 +671,8 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
> > >   	vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET;
> > >   	trace_kvm_exit(vcpu, KVM_ISA_VMX);
> > >
> > > +	tdx_complete_interrupts(vcpu);
> > > +
> > >   	return EXIT_FASTPATH_NONE;
> > >   }
> > >
> > > diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> > > index 44eab734e702..0d8a98feb58e 100644
> > > --- a/arch/x86/kvm/vmx/tdx.h
> > > +++ b/arch/x86/kvm/vmx/tdx.h
> > > @@ -142,6 +142,8 @@ static __always_inline void tdvps_vmcs_check(u32 field, u8 bits)
> > >   			 "Invalid TD VMCS access for 16-bit field");
> > >   }
> > >
> > > +static __always_inline void tdvps_management_check(u64 field, u8 bits) {}
> > > +
> > >   #define TDX_BUILD_TDVPS_ACCESSORS(bits, uclass, lclass)				\
> > >   static __always_inline u##bits td_##lclass##_read##bits(struct vcpu_tdx *tdx,	\
> > >   							u32 field)		\
> > > @@ -200,6 +202,8 @@ TDX_BUILD_TDVPS_ACCESSORS(16, VMCS, vmcs);
> > >   TDX_BUILD_TDVPS_ACCESSORS(32, VMCS, vmcs);
> > >   TDX_BUILD_TDVPS_ACCESSORS(64, VMCS, vmcs);
> > >
> > > +TDX_BUILD_TDVPS_ACCESSORS(8, MANAGEMENT, management);
> > > +
> > >   static __always_inline u64 td_tdcs_exec_read64(struct kvm_tdx *kvm_tdx, u32 field)
> > >   {
> > >   	struct tdx_module_args out;
> > > --
> > > 2.25.1
> > >
> > >
>