On Mon, 2024-11-25 at 14:51 -0800, Sean Christopherson wrote: > On Mon, Nov 25, 2024, Kai Huang wrote: > > On Mon, 2024-11-25 at 07:19 -0800, Sean Christopherson wrote: > > > On Mon, Nov 25, 2024, Binbin Wu wrote: > > > > On 11/22/2024 4:14 AM, Adrian Hunter wrote: > > > > [...] > > > > > - tdx_vcpu_enter_exit() calls guest_state_enter_irqoff() > > > > > and guest_state_exit_irqoff() which comments say should be > > > > > called from non-instrumentable code but noinst was removed > > > > > at Sean's suggestion: > > > > > https://lore.kernel.org/all/Zg8tJspL9uBmMZFO@xxxxxxxxxx/ > > > > > noinstr is also needed to retain NMI-blocking by avoiding > > > > > instrumented code that leads to an IRET which unblocks NMIs. > > > > > A later patch set will deal with NMI VM-exits. > > > > > > > > > In https://lore.kernel.org/all/Zg8tJspL9uBmMZFO@xxxxxxxxxx, Sean mentioned: > > > > "The reason the VM-Enter flows for VMX and SVM need to be noinstr is they do things > > > > like load the guest's CR2, and handle NMI VM-Exits with NMIs blocks. None of > > > > that applies to TDX. Either that, or there are some massive bugs lurking due to > > > > missing code." > > > > > > > > I don't understand why handle NMI VM-Exits with NMIs blocks doesn't apply to > > > > TDX. IIUIC, similar to VMX, TDX also needs to handle the NMI VM-exit in the > > > > noinstr section to avoid the unblock of NMIs due to instrumentation-induced > > > > fault. > > > > > > With TDX, SEAMCALL is mechnically a VM-Exit. KVM is the "guest" running in VMX > > > root mode, and the TDX-Module is the "host", running in SEAM root mode. > > > > > > And for TDH.VP.ENTER, if a hardware NMI arrives with the TDX guest is active, > > > the initial NMI VM-Exit, which consumes the NMI and blocks further NMIs, goes > > > from SEAM non-root to SEAM root. The SEAMRET from SEAM root to VMX root (KVM) > > > is effectively a VM-Enter, and does NOT block NMIs in VMX root (at least, AFAIK). > > > > > > So trying to handle the NMI "exit" in a noinstr section is pointless because NMIs > > > are never blocked. > > > > I thought NMI remains being blocked after SEAMRET? > > No, because NMIs weren't blocked at SEAMCALL. > > > The TDX CPU architecture extension spec says: > > > > " > > On transition to SEAM VMX root operation, the processor can inhibit NMI and SMI. > > While inhibited, if these events occur, then they are tailored to stay pending > > and be delivered when the inhibit state is removed. NMI and external interrupts > > can be uninhibited in SEAM VMX-root operation. In SEAM VMX-root operation, > > MSR_INTR_PENDING can be read to help determine status of any pending events. > > > > On transition to SEAM VMX non-root operation using a VM entry, NMI and INTR > > inhibit states are, by design, updated based on the configuration of the TD VMCS > > used to perform the VM entry. > > > > ... > > > > On transition to legacy VMX root operation using SEAMRET, the NMI and SMI > > inhibit state can be restored to the inhibit state at the time of the previous > > SEAMCALL and any pending NMI/SMI would be delivered if not inhibited. > > " > > > > Here NMI is inhibited in SEAM VMX root, but is never inhibited in VMX root. > > Yep. > > > And the last paragraph does say "any pending NMI would be delivered if not > > inhibited". > > That's referring to the scenario where an NMI becomes pending while the CPU is in > SEAM, i.e. has NMIs blocked. > > > But I thought this applies to the case when "NMI happens in SEAM VMX root", but > > not "NMI happens in SEAM VMX non-root"? I thought the NMI is already > > "delivered" when CPU is in "SEAM VMX non-root", but I guess I was wrong here.. > > When an NMI happens in non-root, the NMI is acknowledged by the CPU prior to > performing VM-Exit. In regular VMX, NMIs are blocked after such VM-Exits. With > TDX, that blocking happens for SEAM root, but the SEAMRET back to VMX root will > load interruptibility from the SEAMCALL VMCS, and I don't see any code in the > TDX-Module that propagates that blocking to SEAMCALL VMCS. Oh, I didn't read the module code, but was trying to looking for clue from the TDX specs. It was a surprise to me that VMX case and TDX case have different behaviour in terms of "NMI blocking when exiting to _host_ VMM". I was thinking SEAMRET (or hardware in general) should have done something to make sure of it. > > Hmm, actually, this means that TDX has a causality inversion, which may become > visible with FRED's NMI source reporting. E.g. NMI X arrives in SEAM non-root > and triggers a VM-Exit. NMI X+1 becomes pending while SEAM root is active. > TDX-Module SEAMRETs to VMX root, NMIs are unblocked, and so NMI X+1 is delivered > and handled before NMI X. Sorry, NMI X was acked by CPU firstly before NMI X+1, why is NMI X+1 delivered before NMI X? > > So the TDX-Module needs something like this: > > diff --git a/src/td_transitions/td_exit.c b/src/td_transitions/td_exit.c > index eecfb2e..b5c17c3 100644 > --- a/src/td_transitions/td_exit.c > +++ b/src/td_transitions/td_exit.c > @@ -527,6 +527,11 @@ void td_vmexit_to_vmm(uint8_t vcpu_state, uint8_t last_td_exit, uint64_t scrub_m > load_xmms_by_mask(tdvps_ptr, xmm_select); > } > > + if (<is NMI VM-Exit => SEAMRET) > + { > + set_guest_inter_blocking_by_nmi(); > + } > + > // 7. Run the common SEAMRET routine. > tdx_vmm_post_dispatching(); > > > and then KVM should indeed handle NMI exits prior to leaving the noinstr section. Yeah to me it should be done unconditionally as it gives the same behaviour to the normal VMX VM-Exit case, that NMI is left blocked after exiting to the host VMM. The NMI Exit reason is passed to host VMM anyway. If NMI is handled immediately after SEAMRET, KVM won't have any chance to do additional things before handing NMI like below for VMX: kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); // call NMI handling routine kvm_after_interrupt(vcpu); I suppose this should be a concern?