On 14/11/19 21:09, Jim Mattson wrote: > Can someone explain this ioctl to me? The more I look at it, the less > sense it makes to me. It certainly has some historical baggage in it, much like KVM_GET_MSR_INDEX_LIST. But (unlike KVM_GET_MSR_INDEX_LIST) it's mostly okay; the issues you report boil down to one of: 1) KVM_GET_SUPPORTED_CPUID being a system ioctl 2) supporting the simple case of taking the output of KVM_GET_SUPPORTED_CPUID and passing it to KVM_SET_CPUID2 3) CPUID information being poorly designed, or just Intel doing undesirable things > Let's start with leaf 0. If I see 0xd in EAX, does that indicate the > *maximum* supported value in EAX? This is easy, you can always supply a subset of the values to the guest, and this includes reducing the value of integer values (such as the number of leaves) or clearing bits. It should be documented better, possibly including with a list of leaves that can be filled by the VMM as it likes (e.g. cache topology). > Or does that mean that only a value > of 0xd is supported for EAX? If I see "AuthenticAMD" in EBX/EDX/ECX, > does that mean that "GenuineIntel" is *not* supported? I thought > people were having reasonable success with cross-vendor migration. This is (2). But in general passing the host value is the safe choice, everything else has reasonable success but it's not something I would recommend in production (and it's something I wouldn't mind removing, really). > What about leaf 7 EBX? If a bit is clear, does that mean setting the > bit is unsupported? If a bit is set, does that mean clearing the bit > is unsupported? Do those answers still apply for bits 6 and 13, where > a '1' indicates the absence of a feature? Again, clearing bits is always supported in theory, but I say "in theory" because of course bits 6 and 13 are indeed problematic. And unfortunately the only solutions for those is to stick your head in the sand and pretend they don't exist. If bits 6 and 13 were handled strictly, people would not be able to migrate VMs between e.g. Haswell and Ivy Bridge machines within the same fleet, which is something people want to do. So, this is (3). A similar case is CPUID[0Ah].EBX (unavailable architectural events). > What about leaf 0xa? Kvm's api.txt says, "Note that certain > capabilities, such as KVM_CAP_X86_DISABLE_EXITS, may expose cpuid > features (e.g. MONITOR) which are not supported by kvm in its default > configuration. If userspace enables such capabilities, it is > responsible for modifying the results of this ioctl appropriately." > However, it appears that the vPMU is enabled not by such a capability, > but by filling in leaf 0xa. Right, the supported values are provided by KVM_GET_SUPPORTED_CPUID. So as long as you don't zero the PMU version id to 0, PMU is enabled. > How does userspace know what leaf 0xa > values are supported by both the hardware and kvm? Reducing functionality is supported---fewer GP or fixed counters, or disabling events by *setting* bits in EBX or reducing EAX[31:24]. > And as for KVM_CAP_X86_DISABLE_EXITS, in particular, how is userspace > supposed to know what the appropriate modification to > KVM_GET_SUPPORTED_CPUID is? Is this documented somewhere else? > > And as for the "certain capabilities" clause above, should I assume > that any capability enabled by userspace may require modifications to > KVM_GET_SUPPORTED_CPUID? What might those required modifications be? > Has anyone thought to document them, or better yet, provide an API to > get them? And finally this is (1). It should be documented by the individual capabilities or ioctls. With KVM_ENABLE_CAP, the only one that is _absent_ from KVM_GET_SUPPORTED_CPUID the MONITOR bit. The opposite case is X2APIC, which is reported as supported in KVM_GET_SUPPORTED_CPUID even though requires KVM_CREATE_IRQCHIP (or KVM_ENABLE_CAP + KVM_CAP_SPLIT_IRQCHIP). Of course any serious VMM uses in-kernel irqchip, but still it's ugly. Providing an API to get a known-good value of CPUID for the current VM (a KVM_GET_SUPPORTED_CPUID vm-ioctl, basically) would be a fine idea. If anybody is interested, I can think of other cases where this applies. It would provide a better way to do (2), essentially. > What about the processor brand string in leaves 0x80000000-0x80000004? > Is a string of NULs really the only supported value? Indeed this should probably return the host values, at least for consistency with the vendor. > And just a nit, but why does this ioctl bother returning explicitly > zeroed leaves for unsupported leaves in range? No particular reason, it just keeps the code simpler. > It would really be nice if I could use this ioctl to write a > "HostSupportsGuest" function based in part on an existing guest's > CPUID information, but that doesn't seem all that feasible, without > intimate knowledge of how the host's implementation of kvm works. It does not depend that much on knowledge of the host's implementation of KVM. However, it does depend on tiny details of the CPUID bits. Thanks, Paolo