On Wed, Oct 12, 2022 at 03:30:26PM -0700, Sagi Shahar <sagis@xxxxxxxxxx> wrote: > > +int tdx_vm_init(struct kvm *kvm) > > +{ > > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); > > + cpumask_var_t packages; > > + int ret, i; > > + u64 err; > > + > > + /* vCPUs can't be created until after KVM_TDX_INIT_VM. */ > > + kvm->max_vcpus = 0; > > The fact that vCPUs can't be created until KVM_TDX_INIT_VM is called > will make it difficult to implement intra host migration. See longer > discussion below. ... > Me, Sean and Isaku had a short discussion offline regarding the > interaction between the proposed API in this patch and intra-host > migration. To summarize: > > For intra-host migration you generally want the destination VM to be > initialized including the right number of vCPUs before you migrate the > source VM state into it. > The proposed API makes it difficult since it forces the destination VM > to call KVM_TDX_INIT_VM before creating vCPUs which initializes TDX > state and allocate a new hkid for the destination VM which would never > be used. This can create a resource limitation on migrating VMs where > there shouldn't be one. > > To solve this issue there are 2 main proposed changes to the API: > > 1. Add a new API based on ioctl(KVM_ENABLE_CAP) to let userspace > modify the max number of vcpus: > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 43a6a7efc6ec..6055098b025b 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -6278,6 +6278,18 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > } > mutex_unlock(&kvm->lock); > break; > + case KVM_CAP_MAX_VCPUS: > + r = -EINVAL; > + if (cap->args[0] > KVM_MAX_VCPUS) > + break; > + > + mutex_lock(&kvm->lock); > + if (!kvm->created_vcpus) { > + kvm->max_vcpus = cap->args[0]; > + r = 0; > + } > + mutex_unlock(&kvm->lock); > + break; > case KVM_CAP_MAX_VCPU_ID: > r = -EINVAL; > if (cap->args[0] > KVM_MAX_VCPU_IDS) > > 2. Modify the existing API such that max_vcpus will be set to > KVM_MAX_VCPUS like in regular VMs and during KVM_TDX_INIT_VM, if the > user created more vCPUs than the number specified, KVM_TDX_INIT_VM > will fail. > > For option (1), there are some possible variations: > 1.a. Do we keep the max_vcpus argument in KVM_TDX_INIT_VM? If so, we > need to check if max_vcpus matches the number of max_vcpus already set > and fail otherwise. > 1.b. Do we require KVM_ENABLE_CAP_VM(KVM_CAP_MAX_VCPUS) to be called? > Theoretically, we can set max_vcpus to the KVM default KVM_MAX_VCPUS > and allow the user to change it as long as vcpus hasn't been created. > If KVM_ENABLE_CAP_VM(KVM_CAP_MAX_VCPUS), the behavior will remain the > same as regular VMs right now. > > In my opinion, the cleanest solution would be option 1 (new > KVM_CAP_MAX_VCPUS API) while removing the max_vcpus argument from > KVM_TDX_INIT_VM and setting the initial max_vcpus to KVM_MAX_VCPUS and > not requiring the new ioctl to be called unless userspace wants to > specifically limit the number of vcpus. In that case, > KVM_CAP_MAX_VCPUS can be called at any time until vcpus are created. Regarding to KVM_CAP_MAX_CPUS vs KVM_TDX_INIT_VM, KVM_CAP_MAX_CPUS is more generic, KVM_CAP_MAX_CPUS would be better. This follows tsc frequency. If option (1) is adapted, the logic should go to the common code, i.e. under linux/virt/kvm/, because there is nothing specific to x86. I don't see any use case other than TDX, though. -- Isaku Yamahata <isaku.yamahata@xxxxxxxxx>