On Fri, Mar 04, 2022, Zeng Guang wrote: > Introduce new max_vcpu_id in KVM for x86 architecture. Userspace > can assign maximum possible vcpu id for current VM session using > KVM_CAP_MAX_VCPU_ID of KVM_ENABLE_CAP ioctl(). > > This is done for x86 only because the sole use case is to guide > memory allocation for PID-pointer table, a structure needed to > enable VMX IPI. > > By default, max_vcpu_id set as KVM_MAX_VCPU_IDS. > > Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx> > Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > Signed-off-by: Zeng Guang <guang.zeng@xxxxxxxxx> > --- > arch/x86/include/asm/kvm_host.h | 6 ++++++ > arch/x86/kvm/x86.c | 11 +++++++++++ The new behavior needs to be documented in api.rst. > 2 files changed, 17 insertions(+) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 6dcccb304775..db16aebd946c 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1233,6 +1233,12 @@ struct kvm_arch { > hpa_t hv_root_tdp; > spinlock_t hv_root_tdp_lock; > #endif > + /* > + * VM-scope maximum vCPU ID. Used to determine the size of structures > + * that increase along with the maximum vCPU ID, in which case, using > + * the global KVM_MAX_VCPU_IDS may lead to significant memory waste. > + */ > + u32 max_vcpu_id; This should be max_vcpu_ids. I agree the it _should_ be max_vcpu_id, but KVM's API for this is awful and we're stuck with the plural name. > }; > > struct kvm_vm_stat { > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 4f6fe9974cb5..ca17cc452bd3 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5994,6 +5994,13 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > kvm->arch.exit_on_emulation_error = cap->args[0]; > r = 0; > break; > + case KVM_CAP_MAX_VCPU_ID: I think it makes sense to change kvm_vm_ioctl_check_extension() to return the current max, it is a VM-scoped ioctl after all. Amusingly, I think we also need a capability to enumerate that KVM_CAP_MAX_VCPU_ID is writable. > + if (cap->args[0] <= KVM_MAX_VCPU_IDS) { > + kvm->arch.max_vcpu_id = cap->args[0]; This needs to be rejected if kvm->created_vcpus > 0, and that check needs to be done under kvm_lock, otherwise userspace can bump the max ID after KVM allocates per-VM structures and trigger buffer overflow. > + r = 0; > + } else If-elif-else statements need curly braces for all paths if any path needs braces. Probably a moot point for this patch due to the above changes. > + r = -E2BIG; This should be -EINVAL, not -E2BIG. E.g. case KVM_CAP_MAX_VCPU_ID: r = -EINVAL; if (cap->args[0] > KVM_MAX_VCPU_IDS) break; mutex_lock(&kvm->lock); if (!kvm->created_vcpus) { kvm->arch.max_vcpu_id = cap->args[0]; r = 0; } mutex_unlock(&kvm->lock); break; > + break; > default: > r = -EINVAL; > break; > @@ -11067,6 +11074,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) > struct page *page; > int r; > > + if (vcpu->vcpu_id >= vcpu->kvm->arch.max_vcpu_id) > + return -E2BIG; Same here, it should be -EINVAL. > + > vcpu->arch.last_vmentry_cpu = -1; > vcpu->arch.regs_avail = ~0; > vcpu->arch.regs_dirty = ~0; > @@ -11589,6 +11599,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) > spin_lock_init(&kvm->arch.hv_root_tdp_lock); > kvm->arch.hv_root_tdp = INVALID_PAGE; > #endif > + kvm->arch.max_vcpu_id = KVM_MAX_VCPU_IDS; > > INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn); > INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn); > -- > 2.27.0 >