On Tue, Oct 3, 2023 at 5:57 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > On 10/3/23 17:20, Jim Mattson wrote: > > Define an X86_FEATURE_* flag for > > CPUID.80000021H:EAX.FsGsKernelGsBaseNonSerializing[bit 1], and > > advertise the feature to userspace via KVM_GET_SUPPORTED_CPUID. > ... > > +#define X86_FEATURE_BASES_NON_SERIAL (20*32+ 1) /* "" FSBASE, GSBASE, and KERNELGSBASE are non-serializing */ > > This is failing to differentiate two *VERY* different things. > > FSBASE, GSBASE, and KERNELGSBASE themselves are registers. They have > *NOTHING* to do with serialization. WRFSBASE, for instance is not > serializing. Reading (with RDMSR) or using any of those three registers > is not serializing. > > The *ONLY* thing that relates them to serialization is the WRMSR > instruction which itself is (mostly) architecturally serializing and the > fact that WRMSR has historically been the main way to write those three > registers. > > The AMD docs call this out, which helps. But the changelog, comments > and probably the feature naming need some work. You're right; I was overly terse. I'll elucidate in v2. > Why does this matter, btw? Why do guests need this bit passed through? The business of declaring breaking changes to the architectural specification in a CPUID bit has never made much sense to me. Legacy software that depends on the original architectural specification isn't going to query the CPUID bit, because the CPUID bit didn't exist when it was written. New software probably isn't going to query the CPUID bit, either, because it has to have an implementation that works on newer processors regardless. Why, then, would a developer bother to provide an implementation that only works on older processors *and* the code to select an implementation based on a CPUID bit? Take, for example, CPUID.(EAX=7,ECX=0):EBX[bit 13], which, IIRC, was the first CPUID bit of the "Ha ha; we're changing the architectural specification" category. When Intel introduced this new behavior in Haswell, they broke WIN87EM.DLL in Windows XP (see https://communities.vmware.com/t5/Legacy-User-Blogs/General-Protection-Fault-in-module-WIN87EM-DLL-at-0001-02C6/ta-p/2770422). I know of at least three software packages commonly running in VMs that were broken as a result. The CPUID bit didn't solve any problems, and I doubt that any software queries that bit today. As a hypervisor developer, however, it's not up to me to make value judgments on individual CPUID bits. If a bit indicates an innate characteristic of the hardware, it should be passed through. No one is likely to query CPUID.80000021H.EAX[bit 21] today, but if someone does query the bit in the future, they can reasonably expect that WRMSR({FS,GS,KERNELGS}_BASE) is a serializing operation whenever this bit is clear. Therefore, any hypervisor that doesn't pass the bit through is broken. Sadly, this also means that for a heterogenous migration pool, the hypervisor must set this bit in the guest CPUID if it is set on any host in the pool. Yes, that means that the legacy behavior may sometimes be present in a VM that enumerates the CPUID bit, but that's the best we can do. I'm a little surprised at the pushback, TBH. Are you implying that there is some advantage to *not* passing this bit through?