Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface

Santosh Shukla <santosh.shukla@xxxxxxx> · Tue, 14 Feb 2023 15:52:38 +0530

On 2/8/2023 9:36 PM, Sean Christopherson wrote:
> On Wed, Feb 08, 2023, Santosh Shukla wrote:
>> On 2/1/2023 3:58 AM, Sean Christopherson wrote:
>>> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
>>>> @@ -5191,9 +5191,12 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>>>>  
>>>>  	vcpu->arch.nmi_injected = events->nmi.injected;
>>>>  	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
>>>> -		vcpu->arch.nmi_pending = events->nmi.pending;
>>>> +		atomic_add(events->nmi.pending, &vcpu->arch.nmi_queued);
>>>> +
>>>>  	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
>>>>  
>>>> +	process_nmi(vcpu);
>>>
>>> Argh, having two process_nmi() calls is ugly (not blaming your code, it's KVM's
>>> ABI that's ugly).  E.g. if we collapse this down, it becomes:
>>>
>>> 	process_nmi(vcpu);
>>>
>>> 	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
>>> 		<blah blah blah>
>>> 	}
>>> 	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
>>>
>>> 	process_nmi(vcpu);
>>>
>>> And the second mess is that V_NMI needs to be cleared.
>>>
>>
>> Can you please elaborate on "V_NMI cleared" scenario? Are you mentioning
>> about V_NMI_MASK or svm->nmi_masked?
> 
> V_NMI_MASK.  KVM needs to purge any pending virtual NMIs when userspace sets
> vCPU event state and KVM_VCPUEVENT_VALID_NMI_PENDING is set.
> 

As per the APM: V_NMI_MASK is managed by the processor
"
V_NMI_MASK: Indicates whether virtual NMIs are masked. The processor will set V_NMI_MASK
once it takes the virtual NMI. V_NMI_MASK is cleared when the guest successfully completes an
IRET instruction or #VMEXIT occurs while delivering the virtual NMI
"

In my initial implementation I had changed V_NMI_MASK for the SMM scenario [1],
This is also not required as HW will save the V_NMI/V_NMI_MASK on 
SMM entry and restore them on RSM.

That said the svm_{get,set}_nmi_mask will look something like:

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a9e9bfbffd72..08911a33cf1e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3635,13 +3635,21 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)

 static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
 {
-       return to_svm(vcpu)->nmi_masked;
+       struct vcpu_svm *svm = to_svm(vcpu);
+
+       if (is_vnmi_enabled(svm))
+               return svm->vmcb->control.int_ctl & V_NMI_MASK;
+       else
+               return svm->nmi_masked;
 }

 static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
 {
        struct vcpu_svm *svm = to_svm(vcpu);

+       if (is_vnmi_enabled(svm))
+               return;
+
        if (masked) {
                svm->nmi_masked = true;
                svm_set_iret_intercept(svm);

is there any inputs on above approach?

[1] https://lore.kernel.org/all/20220810061226.1286-4-santosh.shukla@xxxxxxx/

>>> The first process_nmi() effectively exists to (a) purge nmi_queued and (b) keep
>>> nmi_pending if KVM_VCPUEVENT_VALID_NMI_PENDING is not set.  I think we can just
>>> replace that with an set of nmi_queued, i.e.
>>>
>>> 	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
>>> 		vcpu->arch-nmi_pending = 0;	
>>> 		atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
>>> 		process_nmi();
>>>
>> You mean replace above process_nmi() with kvm_make_request(KVM_REQ_NMI, vcpu), right?
>> I'll try with above proposal.
> 
> Yep, if that works.  Actually, that might be a requirement.  There's a
> 
>   static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> 
> lurking below this.  Invoking process_nmi() before NMI blocking is updated could
> result in KVM incorrectly dropping/keeping NMIs.  I don't think it would be a
> problem in practice since KVM save only one NMI, but userspace could stuff NMIs.
> 
> Huh.  The the existing code is buggy.  events->nmi.pending is a u8, and
> arch.nmi_pending is an unsigned int.  KVM doesn't cap the incoming value, so
> userspace could set up to 255 pending NMIs.  The extra weird part is that the extra
> NMIs will get dropped the next time KVM stumbles through process_nmi().
> 
> Amusingly, KVM only saves one pending NMI, i.e. in a true migration scenario KVM
> may drop an NMI.
> 
>   events->nmi.pending = vcpu->arch.nmi_pending != 0;
> 
> The really amusing part is that that code was added by 7460fb4a3400 ("KVM: Fix
> simultaneous NMIs").  The only thing I can figure is that KVM_GET_VCPU_EVENTS was
> somewhat blindly updated without much thought about what should actually happen.
> 
> So, can you slide the below in early in the series?  Then in this series, convert
> to the above suggested flow (zero nmi_pending, stuff nmi_queued) in another patch?
> 
> From: Sean Christopherson <seanjc@xxxxxxxxxx>
> Date: Wed, 8 Feb 2023 07:44:16 -0800
> Subject: [PATCH] KVM: x86: Save/restore all NMIs when multiple NMIs are
>  pending
> 
> Save all pending NMIs in KVM_GET_VCPU_EVENTS, and queue KVM_REQ_NMI if one
> or more NMIs are pending after KVM_SET_VCPU_EVENTS in order to re-evaluate
> pending NMIs with respect to NMI blocking.
> 
> KVM allows multiple NMIs to be pending in order to faithfully emulate bare
> metal handling of simultaneous NMIs (on bare metal, truly simultaneous
> NMIs are impossible, i.e. one will always arrive first and be consumed).
> Support for simultaneous NMIs botched the save/restore though.  KVM only
> saves one pending NMI, but allows userspace to restore 255 pending NMIs
> as kvm_vcpu_events.nmi.pending is a u8, and KVM's internal state is stored
> in an unsigned int.
> 
> 7460fb4a3400 ("KVM: Fix simultaneous NMIs")
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
>  arch/x86/kvm/x86.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 508074e47bc0..e9339acbf82a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5115,7 +5115,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
>  	events->interrupt.shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
>  
>  	events->nmi.injected = vcpu->arch.nmi_injected;
> -	events->nmi.pending = vcpu->arch.nmi_pending != 0;
> +	events->nmi.pending = vcpu->arch.nmi_pending;
>  	events->nmi.masked = static_call(kvm_x86_get_nmi_mask)(vcpu);
>  
>  	/* events->sipi_vector is never valid when reporting to user space */
> @@ -5202,8 +5202,11 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>  						events->interrupt.shadow);
>  
>  	vcpu->arch.nmi_injected = events->nmi.injected;
> -	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
> +	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
>  		vcpu->arch.nmi_pending = events->nmi.pending;
> +		if (vcpu->arch.nmi_pending)
> +			kvm_make_request(KVM_REQ_NMI, vcpu);
> +	}
>  	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
>  
>  	if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&
> 

Ok.

On top of the above, I am including your suggested change as below...

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e0855599df65..437a6cea3bc7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5201,9 +5201,9 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,

        vcpu->arch.nmi_injected = events->nmi.injected;
        if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
-               vcpu->arch.nmi_pending = events->nmi.pending;
-               if (vcpu->arch.nmi_pending)
-                       kvm_make_request(KVM_REQ_NMI, vcpu);
+               vcpu->arch.nmi_pending = 0;
+               atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
+               kvm_make_request(KVM_REQ_NMI, vcpu);
        }
        static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);

does that make sense?

> base-commit: 6c77ae716d546d71b21f0c9ee7d405314a3f3f9e