On 1/25/23 23:09, Jim Mattson wrote:
The topology leaves returned by KVM_GET_SUPPORTED_CPUID *for over a
decade* have been passed through unmodified from the host. They have
never made sense for KVM_SET_CPUID2, with the unlikely exception of a
whole-host VM.
True, unfortunately people have not read the nonexistent documentation
and they are:
1) rarely adjusting correctly all of 0xB, 0x1F and 0x8000001E;
2) never bounding CPUID[EAX=0].EAX to a known CPUID leaf, resulting for
example in inconsistencies between 0xB and 0x1F.
*But* (2) should not be needed unless you care about maintaining
homogeneous CPUID within a VM migration pool. For something like
kvmtool, having to do (2) would be a workaround for the bug that this
patch fixes.
Our VMM populates the topology of the guest CPUID table on its own, as
any VMM must. However, it uses the host topology (which
KVM_GET_SUPPORTED_CPUID has been providing pass-through *for over a
decade*) to see if the requested guest topology is possible.
Ok, thanks; this is useful to know.
Changing a long-established ABI in a way that breaks userspace
applications is a bad practice. I didn't think we, as a community, did
that. I didn't realize that we were only catering to open source
implementations here.
We aren't. But the open source implementations provide some guidance as
to how the API is being used in the wild, and what the pitfalls are.
You wrote it yourself: any VMM must either populate the topology on its
own, or possibly fill it with zeros. Returning a value that is
extremely unlikely to be used is worse in pretty much every way (apart
from not breaking your VMM, of course).
With a total of six known users (QEMU, crosvm, kvmtool, firecracker,
rust-vmm, and the Google VMM), KVM is damned if it reverts the patch and
damned if it doesn't. There is a tension between fixing the one VMM
that was using KVM_GET_SUPPORTED_CPUID correctly and now breaks loudly,
and fixing 3-4 that were silently broken and are now fixed. I will
probably send a patch to crosvm, though.
The VMM being _proprietary_ doesn't really matter, however it does
matter to me that it is not _public_: it is only used within Google, and
the breakage is neither hard to fix in the VMM nor hard to temporarily
avoid by reverting the patch in the Google kernel.
Paolo