Re: [PATCH v8 7/9] KVM: Move kvm_arch_vcpu_precreate() under kvm->lock

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 15 Apr 2022 15:00:34 +0000

Heh, lot's of people cc'd, but none of the people who's code this affects.

+s390 and arm folks

On Mon, Apr 11, 2022, Zeng Guang wrote:
> Arch specific KVM common data may require pre-allocation or other
> preprocess ready before vCPU creation at runtime.

Please provide the specific motivation for the move, i.e. explain the desire to
do per-VM allocations based on max_vcpu_ids at the first vCPU creation.

> It's safe to invoke kvm_arch_vcpu_precreate() within the protection of
> kvm->lock directly rather than take into account in the implementation for
> each architecture.

This absolutely needs to explain _why_ it's safe, e.g. only arm64, x86, and s390
have non-nop implementations and they're all simple and short with no tendrils
into other code that might take kvm->lock.

And as before, I suspect arm64 needs this protection, the vgic_initialized()
check looks racy.  Though it's hard to tell if doing the check under kvm->lock
actually fixes anything.

> Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> Signed-off-by: Zeng Guang <guang.zeng@xxxxxxxxx>
> ---
>  arch/s390/kvm/kvm-s390.c | 2 --
>  virt/kvm/kvm_main.c      | 2 +-

I think it's also worth changing x86's implementation to check created_vcpus
instead of online_vcpus.  That'll fix a race where userspace could never see the
pr_warn() (which is arguably useless, but whatever), e.g. if it creates a VM with
2 vCPUs and both simultaneously go through kvm_arch_vcpu_precreate().

>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 156d1c25a3c1..5c795bbcf1ea 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -3042,9 +3042,7 @@ static int sca_can_add_vcpu(struct kvm *kvm, unsigned int id)
>  	if (!sclp.has_esca || !sclp.has_64bscao)
>  		return false;
>  
> -	mutex_lock(&kvm->lock);
>  	rc = kvm->arch.use_esca ? 0 : sca_switch_to_extended(kvm);
> -	mutex_unlock(&kvm->lock);
>  
>  	return rc == 0 && id < KVM_S390_ESCA_CPU_SLOTS;
>  }
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 70e05af5ebea..a452e678a015 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3732,9 +3732,9 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>  	}
>  
>  	kvm->created_vcpus++;
> +	r = kvm_arch_vcpu_precreate(kvm, id);

Hmm, so I think I'd prefer this to be invoked before bumping created_vcpus.  The
existing implementation don't reference created_vcpus, so there's no change needed
to existing code.  Logically, a pre-create helper should not see a non-zero count
as the "pre" part strongly implies it's being called _before_ creating the first vCPU.

Then switching from online_vcpus to created_vcpus in the x86 implementation doesn't
need to have a wierd change from "> 0" => "> 1".

Ah, and then it also wouldn't have goofy behavior where it drops and reacquires
kvm->lock on failure just to decrement created_vcpus.

>  	mutex_unlock(&kvm->lock);
>  
> -	r = kvm_arch_vcpu_precreate(kvm, id);
>  	if (r)
>  		goto vcpu_decrement;
>  
> -- 
> 2.27.0
>