On Fri, Jun 14, 2024, Kai Huang wrote: > On Tue, 2024-06-04 at 10:48 +0000, Huang, Kai wrote: > > On Thu, 2024-05-30 at 16:12 -0700, Sean Christopherson wrote: > > > On Thu, May 30, 2024, Kai Huang wrote: > > > > On Wed, 2024-05-29 at 16:15 -0700, Sean Christopherson wrote: > > > > > In the unlikely event there is a legitimate reason for max_vcpus_per_td being > > > > > less than KVM's minimum, then we can update KVM's minimum as needed. But AFAICT, > > > > > that's purely theoretical at this point, i.e. this is all much ado about nothing. > > > > > > > > I am afraid we already have a legitimate case: TD partitioning. Isaku > > > > told me the 'max_vcpus_per_td' is lowed to 512 for the modules with TD > > > > partitioning supported. And again this is static, i.e., doesn't require > > > > TD partitioning to be opt-in to low to 512. > > > > > > So what's Intel's plan for use cases that creates TDs with >512 vCPUs? > > > > I checked with TDX module guys. Turns out the 'max_vcpus_per_td' wasn't > > introduced because of TD partitioning, and they are not actually related. > > > > They introduced this to support "topology virtualization", which requires > > a table to record the X2APIC IDs for all vcpus for each TD. In practice, > > given a TDX module, the 'max_vcpus_per_td', a.k.a, the X2APIC ID table > > size reflects the physical logical cpus that *ALL* platforms that the > > module supports can possibly have. > > > > The reason of this design is TDX guys don't believe there's sense in > > supporting the case where the 'max_vcpus' for one single TD needs to > > exceed the physical logical cpus. > > > > So in short: > > > > - The "max_vcpus_per_td" can be different depending on module versions. In > > practice it reflects the maximum physical logical cpus that all the > > platforms (that the module supports) can possibly have. > > > > - Before CSPs deploy/migrate TD on a TDX machine, they must be aware of > > the "max_vcpus_per_td" the module supports, and only deploy/migrate TD to > > it when it can support. > > > > - For TDX 1.5.xx modules, the value is 576 (the previous number 512 isn't > > correct); For TDX 2.0.xx modules, the value is larger (>1000). For future > > module versions, it could have a smaller number, depending on what > > platforms that module needs to support. Also, if TDX ever gets supported > > on client platforms, we can image the number could be much smaller due to > > the "vcpus per td no need to exceed physical logical cpus". > > > > We may ask them to support the case where 'max_vcpus' for single TD > > exceeds the physical logical cpus, or at least not to low down the value > > any further for future modules (> 2.0.xx modules). We may also ask them > > to give promise to not low the number to below some certain value for any > > future modules. But I am not sure there's any concrete reason to do so? > > > > What's your thinking? It's a reasonable restriction, e.g. KVM_CAP_NR_VCPUS is already capped at number of online CPUs, although userspace is obviously allowed to create oversubscribed VMs. I think the sane thing to do is document that TDX VMs are restricted to the number of logical CPUs in the system, have KVM_CAP_MAX_VCPUS enumerate exactly that, and then sanity check that max_vcpus_per_td is greater than or equal to what KVM reports for KVM_CAP_MAX_VCPUS. Stating that the maximum number of vCPUs depends on the whims TDX module doesn't provide a predictable ABI for KVM, i.e. I don't want to simply forward TDX's max_vcpus_per_td to userspace.