Re: [PATCH v6 2/3] KVM: SVM: Add Idle HLT intercept support

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 27 Feb 2025 06:32:55 -0800

On Thu, Feb 27, 2025, Manali Shukla wrote:
> On 2/26/2025 3:37 AM, Sean Christopherson wrote:
> >> @@ -5225,6 +5230,8 @@ static __init void svm_set_cpu_caps(void)
> >>  		if (vnmi)
> >>  			kvm_cpu_cap_set(X86_FEATURE_VNMI);
> >>  
> >> +		kvm_cpu_cap_check_and_set(X86_FEATURE_IDLE_HLT);
> > 
> > I am 99% certain this is wrong.  Or at the very least, severly lacking an
> > explanation of why it's correct.  If L1 enables Idle HLT but not HLT interception,
> > then it is KVM's responsibility to NOT exit to L1 if there is a pending V_IRQ or
> > V_NMI.
> > 
> > Yeah, it's buggy.  But, it's buggy in part because *existing* KVM support is buggy.
> > If L1 disables HLT exiting, but it's enabled in KVM, then KVM will run L2 with
> > HLT exiting and so it becomes KVM's responsibility to check for valid L2 wake events
> > prior to scheduling out the vCPU if L2 executes HLT.  E.g. nVMX handles this by
> > reading vmcs02.GUEST_INTERRUPT_STATUS.RVI as part of vmx_has_nested_events().  I
> > don't see the equivalent in nSVM.
> > 
> > Amusingly, that means Idle HLT is actually a bug fix to some extent.  E.g. if there
> > is a pending V_IRQ/V_NMI in vmcb02, then running with Idle HLT will naturally do
> > the right thing, i.e. not hang the vCPU.
> > 
> > Anyways, for now, I think the easiest and best option is to simply skip full nested
> > support for the moment.
> > 
> 
> Got it, I see the issue you're talking about. I'll need to look into it a bit more to
> fully understand it. So yeah, we can hold off on full nested support for idle HLT 
> intercept for now.
> 
> Since we are planning to disable Idle HLT support on nested guests, should we do
> something like this ?
> 
> @@ -167,10 +167,15 @@ void recalc_intercepts(struct vcpu_svm *svm)
>         if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu))
>                 vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
> 
> +       if (!guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_IDLE_HLT))
> +               vmcb_clr_intercept(c, INTERCEPT_IDLE_HLT);
> +
> 
> When recalc_intercepts copies the intercept values from vmc01 to vmcb02, it also copies
> the IDLE HLT intercept bit, which is set to 1 in vmcb01. Normally, this isn't a problem 
> because the HLT intercept takes priority when it's on. But if the HLT intercept gets 
> turned off for some reason, the IDLE HLT intercept will stay on, which is not what we
> want.

Why don't we want that?