On Thu, Feb 27, 2025, Manali Shukla wrote: > On 2/27/2025 8:02 PM, Sean Christopherson wrote: > > On Thu, Feb 27, 2025, Manali Shukla wrote: > >> On 2/26/2025 3:37 AM, Sean Christopherson wrote: > >>>> @@ -5225,6 +5230,8 @@ static __init void svm_set_cpu_caps(void) > >>>> if (vnmi) > >>>> kvm_cpu_cap_set(X86_FEATURE_VNMI); > >>>> > >>>> + kvm_cpu_cap_check_and_set(X86_FEATURE_IDLE_HLT); > >>> > >>> I am 99% certain this is wrong. Or at the very least, severly lacking an > >>> explanation of why it's correct. If L1 enables Idle HLT but not HLT interception, > >>> then it is KVM's responsibility to NOT exit to L1 if there is a pending V_IRQ or > >>> V_NMI. > >>> > >>> Yeah, it's buggy. But, it's buggy in part because *existing* KVM support is buggy. > >>> If L1 disables HLT exiting, but it's enabled in KVM, then KVM will run L2 with > >>> HLT exiting and so it becomes KVM's responsibility to check for valid L2 wake events > >>> prior to scheduling out the vCPU if L2 executes HLT. E.g. nVMX handles this by > >>> reading vmcs02.GUEST_INTERRUPT_STATUS.RVI as part of vmx_has_nested_events(). I > >>> don't see the equivalent in nSVM. > >>> > >>> Amusingly, that means Idle HLT is actually a bug fix to some extent. E.g. if there > >>> is a pending V_IRQ/V_NMI in vmcb02, then running with Idle HLT will naturally do > >>> the right thing, i.e. not hang the vCPU. > >>> > >>> Anyways, for now, I think the easiest and best option is to simply skip full nested > >>> support for the moment. > >>> > >> > >> Got it, I see the issue you're talking about. I'll need to look into it a bit more to > >> fully understand it. So yeah, we can hold off on full nested support for idle HLT > >> intercept for now. > >> > >> Since we are planning to disable Idle HLT support on nested guests, should we do > >> something like this ? > >> > >> @@ -167,10 +167,15 @@ void recalc_intercepts(struct vcpu_svm *svm) > >> if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu)) > >> vmcb_clr_intercept(c, INTERCEPT_VMMCALL); > >> > >> + if (!guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_IDLE_HLT)) > >> + vmcb_clr_intercept(c, INTERCEPT_IDLE_HLT); > >> + > >> > >> When recalc_intercepts copies the intercept values from vmc01 to vmcb02, it also copies > >> the IDLE HLT intercept bit, which is set to 1 in vmcb01. Normally, this isn't a problem > >> because the HLT intercept takes priority when it's on. But if the HLT intercept gets > >> turned off for some reason, the IDLE HLT intercept will stay on, which is not what we > >> want. > > > > Why don't we want that? > > The idle-HLT intercept remains '1' for the L2 guest. Now, when L2 executes HLT and there > is no pending event available, it will still do idle-HLT exit, although Idle HLT > was never explicitly enabled on L2 guest. Yes, but why is that a problem? L1 doesn't want to intercept HLT, so KVM isn't violating the architecture by not intercepting HLT.