Re: [PATCH v3 2/4] KVM: SVM: Enable Bus lock threshold exit

Manali Shukla <manali.shukla@xxxxxxx> · Tue, 5 Nov 2024 21:11:58 +0530

On 11/5/2024 7:52 AM, Sean Christopherson wrote:
> On Sun, Nov 03, 2024, Manali Shukla wrote:
>> On 10/15/2024 11:19 PM, Sean Christopherson wrote:
>>> On Fri, Oct 04, 2024, Manali Shukla wrote:
>> ...
>>>>  
>>>> +static int bus_lock_exit(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	struct vcpu_svm *svm = to_svm(vcpu);
>>>> +
>>>> +	vcpu->run->exit_reason = KVM_EXIT_X86_BUS_LOCK;
>>>> +	vcpu->run->flags |= KVM_RUN_X86_BUS_LOCK;
>>>> +
>>>> +	/*
>>>> +	 * Reload the counter with value greater than '0'.
>>>
>>> The value quite obviously must be exactly '1', not simply greater than '0.  I also
>>> think this is the wrong place to set the counter.  Rather than set the counter at
>>> the time of exit, KVM should implement a vcpu->arch.complete_userspace_io callback
>>> and set the counter to '1' if and only if RIP (or LIP, but I have no objection to
>>> keeping things simple) is unchanged.  It's a bit of extra complexity, but it will
>>> make it super obvious why KVM is setting the counter to '1'.  And, if userspace
>>> wants to stuff state and move past the instruction, e.g. by emulating the guilty
>>> instruction, then KVM won't unnecessarily allow a bus lock in the guest.
>>>
>>> And then the comment can be:
>>>
>>> 	/*
>>> 	 * If userspace has NOT change RIP, then KVM's ABI is to let the guest
>>> 	 * execute the bus-locking instruction.  Set the bus lock counter to '1'
>>> 	 * to effectively step past the bus lock.
>>> 	 */
>>>
>>
>> The bus lock threshold intercept feature is available for SEV-ES and SEV-SNP
>> guests too. The rip where the bus lock exit occurred, is not available in
>> bus_lock_exit handler for SEV-ES and SEV-SNP guests, so the above-mentioned
>> solution won't work with SEV-ES and SEV-SNP guests.
>>
>> I would propose to add the above-mentioned solution only for normal and SEV guests
>> and unconditionally reloading of bus_lock_counter to 1 in complete_userspace_io
>> for SEV-ES and SEV-SNP guests.
> 
> Yeah, that works.  Though I would condition the check on guest_state_protected.
> Actually, and this is going to seem really stupid, but everything will Just Work
> if you use kvm_get_linear_rip() and kvm_is_linear_rip(), because kvm_get_linear_rip()
> returns '0' for vCPUs with protected state.  I.e. KVM will do a rather superfluous
> cui() callback, but otherwise it's fine.  Silly, but in many ways preferable to
> special casing ES and SNP guests.

Ack.

> 
> On a related topic, can you add a refacotring prep patch to move linear_rip out
> of kvm_pio_request and place it next to complete_userspace_io?  There's nothing
> port I/O specific about that field, it just so happens to that port I/O is the
> only case where KVM's ABI is to let userspace stuff state (to emulate RESET)
> without first completing the I/O instruction.
> 
Sure. I will add this refactoring prep patch with v4.

- Manali

> I.e.
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 8e8ca6dab2b2..8617b15096a6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -406,7 +406,6 @@ struct kvm_rmap_head {
>  };
>  
>  struct kvm_pio_request {
> -       unsigned long linear_rip;
>         unsigned long count;
>         int in;
>         int port;
> @@ -884,6 +883,7 @@ struct kvm_vcpu_arch {
>         bool emulate_regs_need_sync_to_vcpu;
>         bool emulate_regs_need_sync_from_vcpu;
>         int (*complete_userspace_io)(struct kvm_vcpu *vcpu);
> +       unsigned long cui_linear_rip;
>  
>         gpa_t time;
>         struct pvclock_vcpu_time_info hv_clock;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 425a301911a6..7704d3901481 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9308,7 +9308,7 @@ static int complete_fast_pio_out(struct kvm_vcpu *vcpu)
>  {
>         vcpu->arch.pio.count = 0;
>  
> -       if (unlikely(!kvm_is_linear_rip(vcpu, vcpu->arch.pio.linear_rip)))
> +       if (unlikely(!kvm_is_linear_rip(vcpu, vcpu->arch.cui_linear_rip)))
>                 return 1;
>  
>         return kvm_skip_emulated_instruction(vcpu);
> @@ -9333,7 +9333,7 @@ static int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size,
>                         complete_fast_pio_out_port_0x7e;
>                 kvm_skip_emulated_instruction(vcpu);
>         } else {
> -               vcpu->arch.pio.linear_rip = kvm_get_linear_rip(vcpu);
> +               vcpu->arch.cui_linear_rip = kvm_get_linear_rip(vcpu);
>                 vcpu->arch.complete_userspace_io = complete_fast_pio_out;
>         }
>         return 0;
> @@ -9346,7 +9346,7 @@ static int complete_fast_pio_in(struct kvm_vcpu *vcpu)
>         /* We should only ever be called with arch.pio.count equal to 1 */
>         BUG_ON(vcpu->arch.pio.count != 1);
>  
> -       if (unlikely(!kvm_is_linear_rip(vcpu, vcpu->arch.pio.linear_rip))) {
> +       if (unlikely(!kvm_is_linear_rip(vcpu, vcpu->arch.cui_linear_rip))) {
>                 vcpu->arch.pio.count = 0;
>                 return 1;
>         }
> @@ -9375,7 +9375,7 @@ static int kvm_fast_pio_in(struct kvm_vcpu *vcpu, int size,
>                 return ret;
>         }
>  
> -       vcpu->arch.pio.linear_rip = kvm_get_linear_rip(vcpu);
> +       vcpu->arch.cui_linear_rip = kvm_get_linear_rip(vcpu);
>         vcpu->arch.complete_userspace_io = complete_fast_pio_in;
>  
>         return 0;