On Fri, Jun 02, 2023, Jim Mattson wrote: > On Fri, Jun 2, 2023 at 12:16 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > On Fri, Jun 02, 2023, Jim Mattson wrote: > > > On Fri, Jun 2, 2023 at 12:18 AM Gao Shiyuan <gaoshiyuan@xxxxxxxxx> wrote: > > > > > > > > From: Shiyuan Gao <gaoshiyuan@xxxxxxxxx> > > > > > > > > When live-migrate VM on icelake microarchitecture, if the source > > > > host kernel before commit 2e8cd7a3b828 ("kvm: x86: limit the maximum > > > > number of vPMU fixed counters to 3") and the dest host kernel after this > > > > commit, the migration will fail. > > > > > > > > The source VM's CPUID.0xA.edx[0..4]=4 that is reported by KVM and > > > > the IA32_PERF_GLOBAL_CTRL MSR is 0xf000000ff. However the dest VM's > > > > CPUID.0xA.edx[0..4]=3 and the IA32_PERF_GLOBAL_CTRL MSR is 0x7000000ff. > > > > This inconsistency leads to migration failure. > > > > IMO, this is a userspace bug. KVM provided userspace all the information it needed > > to know that the target is incompatible (3 counters instead of 4), it's userspace's > > fault for not sanity checking that the target is compatible. > > > > I agree that KVM isn't blame free, but hacking KVM to cover up userspace mistakes > > everytime a feature appears or disappears across kernel versions or configs isn't > > maintainable. > > Um... > > "You may never migrate this VM to a newer kernel. Sucks to be you." Userspace can fudge/fixup state to migrate the VM. > That's not very user-friendly. Heh, I never claimed it was. I don't think KVM should treat this any differently than if userspace didn't strip a new feature when regurgitating KVM_GET_SUPPORTED_CPUID, and ended up with VMs that couldn't migrate to *older* kernels. The only way this is KVM's responsibility is if KVM's ABI is defined such that KVM_GET_SUPPORTED_CPUID is strictly "increasing" across kernel versions (on the same hardware). I reall don't want to go down that route, as that would complicate fixing KVM bugs, and would pull in things beyond KVM's control. E.g. PCID support is about to disappear on hardware affected by the recent INVLPG erratum (commit ce0b15d11ad8 "x86/mm: Avoid incomplete Global INVLPG flushes").