Re: [PATCH v19 037/130] KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 9 May 2024 09:35:57 -0700

On Mon, Feb 26, 2024, isaku.yamahata@xxxxxxxxx wrote:
> From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> 
> TDX has its own limitation on the maximum number of vcpus that the guest
> can accommodate.  Allow x86 kvm backend to implement its own KVM_ENABLE_CAP
> handler and implement TDX backend for KVM_CAP_MAX_VCPUS.  user space VMM,
> e.g. qemu, can specify its value instead of KVM_MAX_VCPUS.
> 
> When creating TD (TDH.MNG.INIT), the maximum number of vcpu needs to be
> specified as struct td_params_struct.  and the value is a part of
> measurement.  The user space has to specify the value somehow.  There are
> two options for it.
> option 1. API (Set KVM_CAP_MAX_VCPU) to specify the value (this patch)

When I suggested adding a capability[*], the intent was for the capability to
be generic, not buried in TDX code.  I can't think of any reason why this can't
be supported for all VMs on all architectures.  The only wrinkle is that it'll
require a separate capability since userspace needs to be able to detect that
KVM supports restricting the number of vCPUs, but that'll still be _less_ code.

[*] https://lore.kernel.org/all/YZVsnZ8e7cXls2P2@xxxxxxxxxx

> +static int vt_max_vcpus(struct kvm *kvm)
> +{
> +	if (!kvm)
> +		return KVM_MAX_VCPUS;
> +
> +	if (is_td(kvm))
> +		return min(kvm->max_vcpus, TDX_MAX_VCPUS);
> +
> +	return kvm->max_vcpus;

This is _completely_ orthogonal to allowing userspace to restrict the maximum
number of vCPUs.  And unless I'm missing something, it's also ridiculous and
unnecessary at this time.  KVM x86 limits KVM_MAX_VCPUS to 4096:

  config KVM_MAX_NR_VCPUS
	int "Maximum number of vCPUs per KVM guest"
	depends on KVM
	range 1024 4096
	default 4096 if MAXSMP
	default 1024
	help

whereas the limitation from TDX is apprarently simply due to TD_PARAMS taking
a 16-bit unsigned value:

  #define TDX_MAX_VCPUS  (~(u16)0)

i.e. it will likely be _years_ before TDX's limitation matters, if it ever does.
And _if_ it becomes a problem, we don't necessarily need to have a different
_runtime_ limit for TDX, e.g. TDX support could be conditioned on KVM_MAX_NR_VCPUS
being <= 64k.

So rather than add a bunch of pointless plumbing, just throw in

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 137d08da43c3..018d5b9eb93d 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -2488,6 +2488,9 @@ static int setup_tdparams(struct kvm *kvm, struct td_params *td_params,
                return -EOPNOTSUPP;
        }
 
+       BUILD_BUG_ON(CONFIG_KVM_MAX_NR_VCPUS <
+                    sizeof(td_params->max_vcpus) * BITS_PER_BYTE);
+
        td_params->max_vcpus = kvm->max_vcpus;
        td_params->attributes = init_vm->attributes;
        /* td_params->exec_controls = TDX_CONTROL_FLAG_NO_RBP_MOD; */