Re: [PATCH v19 037/130] KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 15/06/2024 12:04 pm, Sean Christopherson wrote:
On Fri, Jun 14, 2024, Kai Huang wrote:
On Tue, 2024-06-04 at 10:48 +0000, Huang, Kai wrote:
On Thu, 2024-05-30 at 16:12 -0700, Sean Christopherson wrote:
On Thu, May 30, 2024, Kai Huang wrote:
On Wed, 2024-05-29 at 16:15 -0700, Sean Christopherson wrote:
In the unlikely event there is a legitimate reason for max_vcpus_per_td being
less than KVM's minimum, then we can update KVM's minimum as needed.  But AFAICT,
that's purely theoretical at this point, i.e. this is all much ado about nothing.

I am afraid we already have a legitimate case: TD partitioning.  Isaku
told me the 'max_vcpus_per_td' is lowed to 512 for the modules with TD
partitioning supported.  And again this is static, i.e., doesn't require
TD partitioning to be opt-in to low to 512.

So what's Intel's plan for use cases that creates TDs with >512 vCPUs?

I checked with TDX module guys.  Turns out the 'max_vcpus_per_td' wasn't
introduced because of TD partitioning, and they are not actually related.

They introduced this to support "topology virtualization", which requires
a table to record the X2APIC IDs for all vcpus for each TD.  In practice,
given a TDX module, the 'max_vcpus_per_td', a.k.a, the X2APIC ID table
size reflects the physical logical cpus that *ALL* platforms that the
module supports can possibly have.

The reason of this design is TDX guys don't believe there's sense in
supporting the case where the 'max_vcpus' for one single TD needs to
exceed the physical logical cpus.

So in short:

- The "max_vcpus_per_td" can be different depending on module versions. In
practice it reflects the maximum physical logical cpus that all the
platforms (that the module supports) can possibly have.

- Before CSPs deploy/migrate TD on a TDX machine, they must be aware of
the "max_vcpus_per_td" the module supports, and only deploy/migrate TD to
it when it can support.

- For TDX 1.5.xx modules, the value is 576 (the previous number 512 isn't
correct); For TDX 2.0.xx modules, the value is larger (>1000).  For future
module versions, it could have a smaller number, depending on what
platforms that module needs to support.  Also, if TDX ever gets supported
on client platforms, we can image the number could be much smaller due to
the "vcpus per td no need to exceed physical logical cpus".

We may ask them to support the case where 'max_vcpus' for single TD
exceeds the physical logical cpus, or at least not to low down the value
any further for future modules (> 2.0.xx modules).  We may also ask them
to give promise to not low the number to below some certain value for any
future modules.  But I am not sure there's any concrete reason to do so?

What's your thinking?

It's a reasonable restriction, e.g. KVM_CAP_NR_VCPUS is already capped at number
of online CPUs, although userspace is obviously allowed to create oversubscribed
VMs.

I think the sane thing to do is document that TDX VMs are restricted to the number
of logical CPUs in the system, have KVM_CAP_MAX_VCPUS enumerate exactly that, and
then sanity check that max_vcpus_per_td is greater than or equal to what KVM
reports for KVM_CAP_MAX_VCPUS. >
Stating that the maximum number of vCPUs depends on the whims TDX module doesn't
provide a predictable ABI for KVM, i.e. I don't want to simply forward TDX's
max_vcpus_per_td to userspace.


This sounds good to me. I think it should be also OK for client too, if TDX ever gets supported for client.

IIUC we can consult the @nr_cpu_ids or num_possible_cpus() to get the "number of logical CPUs in the system". And we can reject to use the TDX module if 'max_vcpus_per_td' turns to be smaller.

I think the relevant question is is whether we should still report "number of logical CPUs in the system" via KVM_CAP_MAX_VCPUS? Because if doing so, this still means the userspace will need to check KVM_CAP_MAX_VCPUS vm extention on per-vm basis.

And if it does, then from userspace's perspective, it actually doesn't matter whether underneath the per-vm KVM_CAP_MAX_VCPUS is limited by TDX or the system cpus (also see below).

The userspace cannot tell the difference anyway. It just needs to change to query KVM_CAP_MAX_VCPUS to per-vm basis.

Or, we could limit this to TDX guest ONLY:

The KVM_CAP_MAX_VCPUS is still global. However for TDX specifically, the userspace should use other way to query the number of LPs the system supports (I assume there should be existing ABI for this?).

But looks this isn't something nice?




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux