On Tue, May 14, 2024, Kai Huang wrote: > > > On 11/05/2024 2:04 am, Sean Christopherson wrote: > > On Thu, May 09, 2024, Isaku Yamahata wrote: > > > On Fri, May 10, 2024 at 11:19:44AM +1200, Kai Huang <kai.huang@xxxxxxxxx> wrote: > > > > On 10/05/2024 10:52 am, Sean Christopherson wrote: > > > > > On Fri, May 10, 2024, Kai Huang wrote: > > > > > > On 10/05/2024 4:35 am, Sean Christopherson wrote: > > > > > > > KVM x86 limits KVM_MAX_VCPUS to 4096: > > > > > > > > > > > > > > config KVM_MAX_NR_VCPUS > > > > > > > int "Maximum number of vCPUs per KVM guest" > > > > > > > depends on KVM > > > > > > > range 1024 4096 > > > > > > > default 4096 if MAXSMP > > > > > > > default 1024 > > > > > > > help > > > > > > > > > > > > > > whereas the limitation from TDX is apprarently simply due to TD_PARAMS taking > > > > > > > a 16-bit unsigned value: > > > > > > > > > > > > > > #define TDX_MAX_VCPUS (~(u16)0) > > > > > > > > > > > > > > i.e. it will likely be _years_ before TDX's limitation matters, if it ever does. > > > > > > > And _if_ it becomes a problem, we don't necessarily need to have a different > > > > > > > _runtime_ limit for TDX, e.g. TDX support could be conditioned on KVM_MAX_NR_VCPUS > > > > > > > being <= 64k. > > > > > > > > > > > > Actually later versions of TDX module (starting from 1.5 AFAICT), the module > > > > > > has a metadata field to report the maximum vCPUs that the module can support > > > > > > for all TDX guests. > > > > > > > > > > My quick glance at the 1.5 source shows that the limit is still effectively > > > > > 0xffff, so again, who cares? Assert on 0xffff compile time, and on the reported > > > > > max at runtime and simply refuse to use a TDX module that has dropped the minimum > > > > > below 0xffff. > > > > > > > > I need to double check why this metadata field was added. My concern is in > > > > future module versions they may just low down the value. > > > > > > TD partitioning would reduce it much. > > > > That's still not a reason to plumb in what is effectively dead code. Either > > partitioning is opt-in, at which I suspect KVM will need yet more uAPI to express > > the limitations to userspace, or the TDX-module is potentially breaking existing > > use cases. > > The 'max_vcpus_per_td' global metadata fields is static for the TDX module. > If the module supports the TD partitioning, it just reports some smaller > value regardless whether we opt-in TDX partitioning or not. > > I think the point is this 'max_vcpus_per_td' is TDX architectural thing and > kernel should not make any assumption of the value of it. It's not an assumption, it's a requirement. And KVM already places requirements on "hardware", e.g. kvm-intel.ko will refuse to load if the CPU doesn't support a bare mimimum VMX feature set. Refusing to enable TDX because max_vcpus_per_td is unexpectedly low isn't fundamentally different than refusing to enable VMX because IRQ window exiting is unsupported. In the unlikely event there is a legitimate reason for max_vcpus_per_td being less than KVM's minimum, then we can update KVM's minimum as needed. But AFAICT, that's purely theoretical at this point, i.e. this is all much ado about nothing.