On Thu, 2024-05-30 at 16:12 -0700, Sean Christopherson wrote: > On Thu, May 30, 2024, Kai Huang wrote: > > On Wed, 2024-05-29 at 16:15 -0700, Sean Christopherson wrote: > > > In the unlikely event there is a legitimate reason for max_vcpus_per_td being > > > less than KVM's minimum, then we can update KVM's minimum as needed. But AFAICT, > > > that's purely theoretical at this point, i.e. this is all much ado about nothing. > > > > I am afraid we already have a legitimate case: TD partitioning. Isaku > > told me the 'max_vcpus_per_td' is lowed to 512 for the modules with TD > > partitioning supported. And again this is static, i.e., doesn't require > > TD partitioning to be opt-in to low to 512. > > So what's Intel's plan for use cases that creates TDs with >512 vCPUs? I checked with TDX module guys. Turns out the 'max_vcpus_per_td' wasn't introduced because of TD partitioning, and they are not actually related. They introduced this to support "topology virtualization", which requires a table to record the X2APIC IDs for all vcpus for each TD. In practice, given a TDX module, the 'max_vcpus_per_td', a.k.a, the X2APIC ID table size reflects the physical logical cpus that *ALL* platforms that the module supports can possibly have. The reason of this design is TDX guys don't believe there's sense in supporting the case where the 'max_vcpus' for one single TD needs to exceed the physical logical cpus. So in short: - The "max_vcpus_per_td" can be different depending on module versions. In practice it reflects the maximum physical logical cpus that all the platforms (that the module supports) can possibly have. - Before CSPs deploy/migrate TD on a TDX machine, they must be aware of the "max_vcpus_per_td" the module supports, and only deploy/migrate TD to it when it can support. - For TDX 1.5.xx modules, the value is 576 (the previous number 512 isn't correct); For TDX 2.0.xx modules, the value is larger (>1000). For future module versions, it could have a smaller number, depending on what platforms that module needs to support. Also, if TDX ever gets supported on client platforms, we can image the number could be much smaller due to the "vcpus per td no need to exceed physical logical cpus". We may ask them to support the case where 'max_vcpus' for single TD exceeds the physical logical cpus, or at least not to low down the value any further for future modules (> 2.0.xx modules). We may also ask them to give promise to not low the number to below some certain value for any future modules. But I am not sure there's any concrete reason to do so? What's your thinking?