On Mon, Nov 25, 2024, Binbin Wu wrote: > On 11/22/2024 4:14 AM, Adrian Hunter wrote: > [...] > > - tdx_vcpu_enter_exit() calls guest_state_enter_irqoff() > > and guest_state_exit_irqoff() which comments say should be > > called from non-instrumentable code but noinst was removed > > at Sean's suggestion: > > https://lore.kernel.org/all/Zg8tJspL9uBmMZFO@xxxxxxxxxx/ > > noinstr is also needed to retain NMI-blocking by avoiding > > instrumented code that leads to an IRET which unblocks NMIs. > > A later patch set will deal with NMI VM-exits. > > > In https://lore.kernel.org/all/Zg8tJspL9uBmMZFO@xxxxxxxxxx, Sean mentioned: > "The reason the VM-Enter flows for VMX and SVM need to be noinstr is they do things > like load the guest's CR2, and handle NMI VM-Exits with NMIs blocks. None of > that applies to TDX. Either that, or there are some massive bugs lurking due to > missing code." > > I don't understand why handle NMI VM-Exits with NMIs blocks doesn't apply to > TDX. IIUIC, similar to VMX, TDX also needs to handle the NMI VM-exit in the > noinstr section to avoid the unblock of NMIs due to instrumentation-induced > fault. With TDX, SEAMCALL is mechnically a VM-Exit. KVM is the "guest" running in VMX root mode, and the TDX-Module is the "host", running in SEAM root mode. And for TDH.VP.ENTER, if a hardware NMI arrives with the TDX guest is active, the initial NMI VM-Exit, which consumes the NMI and blocks further NMIs, goes from SEAM non-root to SEAM root. The SEAMRET from SEAM root to VMX root (KVM) is effectively a VM-Enter, and does NOT block NMIs in VMX root (at least, AFAIK). So trying to handle the NMI "exit" in a noinstr section is pointless because NMIs are never blocked. TDX is also different because KVM isn't responsible for context switching guest state. Specifically, CR2 is managed by the TDX Module, and so there is no window where KVM runs with guest CR2, and thus there is no risk of clobbering guest CR2 with a host value, e.g. due to take a #PF due instrumentation triggering something. All that said, I did forget that code that runs between guest_state_enter_irqoff() and guest_state_exit_irqoff() can't be instrumeneted. And at least as of patch 2 in this series, the simplest way to make that happen is to tag tdx_vcpu_enter_exit() as noinstr. Just please make sure nothing else is added in the noinstr section unless it absolutely needs to be there.