On Mon, Nov 30, 2020, Paolo Bonzini wrote: > On 16/09/20 02:19, Sean Christopherson wrote: > > > > TDX also selectively blocks/skips portions of other ioctl()s so that the > > TDX code itself can yell loudly if e.g. .get_cpl() is invoked. The event > > injection restrictions are due to direct injection not being allowed (except > > for NMIs); all IRQs have to be routed through APICv (posted interrupts) and > > exception injection is completely disallowed. > > > > kvm_vcpu_ioctl_x86_get_vcpu_events: > > if (!vcpu->kvm->arch.guest_state_protected) > > events->interrupt.shadow = kvm_x86_ops.get_interrupt_shadow(vcpu); > > Perhaps an alternative implementation can enter the vCPU with immediate exit > until no events are pending, and then return all zeroes? This can't work. If the guest has STI blocking, e.g. it did STI->TDVMCALL with a valid vIRQ in GUEST_RVI, then events->interrupt.shadow should technically be non-zero to reflect the STI blocking. But, the immediate exit (a hardware IRQ for TDX guests) will cause VM-Exit before the guest can execute any instructions and thus the guest will never clear STI blocking and never consume the pending event. Or there could be a valid vIRQ, but GUEST_RFLAGS.IF=0, in which case KVM would need to run the guest for an indeterminate amount of time to wait for the vIRQ to be consumed. Tangentially related, I haven't looked through the official external TDX docs, but I suspect that vmcs.GUEST_RVI is listed as inaccessible for production TDs. This will be changed as the VMM needs access to GUEST_RVI to handle STI->TDVMCALL(HLT), otherwise the VMM may incorrectly put the vCPU into a blocked (not runnable) state even though it has a pending wake event.