On Fri, May 10, 2024 at 11:19:44AM +1200, "Huang, Kai" <kai.huang@xxxxxxxxx> wrote: > > > On 10/05/2024 10:52 am, Sean Christopherson wrote: > > On Fri, May 10, 2024, Kai Huang wrote: > > > On 10/05/2024 4:35 am, Sean Christopherson wrote: > > > > KVM x86 limits KVM_MAX_VCPUS to 4096: > > > > > > > > config KVM_MAX_NR_VCPUS > > > > int "Maximum number of vCPUs per KVM guest" > > > > depends on KVM > > > > range 1024 4096 > > > > default 4096 if MAXSMP > > > > default 1024 > > > > help > > > > > > > > whereas the limitation from TDX is apprarently simply due to TD_PARAMS taking > > > > a 16-bit unsigned value: > > > > > > > > #define TDX_MAX_VCPUS (~(u16)0) > > > > > > > > i.e. it will likely be _years_ before TDX's limitation matters, if it ever does. > > > > And _if_ it becomes a problem, we don't necessarily need to have a different > > > > _runtime_ limit for TDX, e.g. TDX support could be conditioned on KVM_MAX_NR_VCPUS > > > > being <= 64k. > > > > > > Actually later versions of TDX module (starting from 1.5 AFAICT), the module > > > has a metadata field to report the maximum vCPUs that the module can support > > > for all TDX guests. > > > > My quick glance at the 1.5 source shows that the limit is still effectively > > 0xffff, so again, who cares? Assert on 0xffff compile time, and on the reported > > max at runtime and simply refuse to use a TDX module that has dropped the minimum > > below 0xffff. > > I need to double check why this metadata field was added. My concern is in > future module versions they may just low down the value. TD partitioning would reduce it much. > But another way to handle is we can adjust code when that truly happens? > Might not be ideal for stable kernel situation, though? > > > > And we only allow the kvm->max_vcpus to be updated if it's a TDX guest in > > > the vt_vm_enable_cap(). The reason is we want to avoid unnecessary change > > > for normal VMX guests. > > > > That's a frankly ridiculous reason to bury code in TDX. Nothing is _forcing_ > > userspace to set KVM_CAP_MAX_VCPUS, i.e. there won't be any change to VMX VMs > > unless userspace _wants_ there to be a change. > > Right. Anyway allowing userspace to set KVM_CAP_MAX_VCPUS for non-TDX > guests shouldn't have any issue. > > The main reason to bury code in TDX is it needs to additionally check > tdx_info->max_vcpus_per_td. We can just do in common code if we avoid that > TDX specific check. So we can make it arch-independent. When creating VM, we can set kvm->max_vcpu = tdx_info->max_vcpus_per_td by tdx_vm_init(). The check can be common like "if (new max_vcpu > kvm->max_vcpu) error". Or we can add kvm->hard_max_vcpu or something, arch-common code can have "if (kvm->hard_max_vcpu && new max_vcpu > kvm->hard_max_vcpu) error". -- Isaku Yamahata <isaku.yamahata@xxxxxxxxx>