On Thu, Feb 27, 2025, Manali Shukla wrote: > On 2/26/2025 3:37 AM, Sean Christopherson wrote: > >> @@ -5225,6 +5230,8 @@ static __init void svm_set_cpu_caps(void) > >> if (vnmi) > >> kvm_cpu_cap_set(X86_FEATURE_VNMI); > >> > >> + kvm_cpu_cap_check_and_set(X86_FEATURE_IDLE_HLT); > > > > I am 99% certain this is wrong. Or at the very least, severly lacking an > > explanation of why it's correct. If L1 enables Idle HLT but not HLT interception, > > then it is KVM's responsibility to NOT exit to L1 if there is a pending V_IRQ or > > V_NMI. > > > > Yeah, it's buggy. But, it's buggy in part because *existing* KVM support is buggy. > > If L1 disables HLT exiting, but it's enabled in KVM, then KVM will run L2 with > > HLT exiting and so it becomes KVM's responsibility to check for valid L2 wake events > > prior to scheduling out the vCPU if L2 executes HLT. E.g. nVMX handles this by > > reading vmcs02.GUEST_INTERRUPT_STATUS.RVI as part of vmx_has_nested_events(). I > > don't see the equivalent in nSVM. > > > > Amusingly, that means Idle HLT is actually a bug fix to some extent. E.g. if there > > is a pending V_IRQ/V_NMI in vmcb02, then running with Idle HLT will naturally do > > the right thing, i.e. not hang the vCPU. > > > > Anyways, for now, I think the easiest and best option is to simply skip full nested > > support for the moment. > > > > Got it, I see the issue you're talking about. I'll need to look into it a bit more to > fully understand it. So yeah, we can hold off on full nested support for idle HLT > intercept for now. > > Since we are planning to disable Idle HLT support on nested guests, should we do > something like this ? > > @@ -167,10 +167,15 @@ void recalc_intercepts(struct vcpu_svm *svm) > if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu)) > vmcb_clr_intercept(c, INTERCEPT_VMMCALL); > > + if (!guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_IDLE_HLT)) > + vmcb_clr_intercept(c, INTERCEPT_IDLE_HLT); > + > > When recalc_intercepts copies the intercept values from vmc01 to vmcb02, it also copies > the IDLE HLT intercept bit, which is set to 1 in vmcb01. Normally, this isn't a problem > because the HLT intercept takes priority when it's on. But if the HLT intercept gets > turned off for some reason, the IDLE HLT intercept will stay on, which is not what we > want. Why don't we want that?