On Fri, Jun 03, 2022, mike tancsa wrote: > On 6/2/2022 5:46 PM, Sean Christopherson wrote: > > On Thu, Jun 02, 2022, mike tancsa wrote: > > > On 6/2/2022 8:42 AM, Igor Mammedov wrote: > > > > On Tue, 31 May 2022 13:00:07 -0400 > > > > mike tancsa <mike@xxxxxxxxxx> wrote: > > > > > > > > > Hello, > > > > > > > > > > I have been using kvm since the Ubuntu 18 and 20.x LTS series of > > > > > kernels and distributions without any issues on a whole range of Guests > > > > > up until now. Recently, we spun up an Ubuntu LTS 22 hypervisor to add to > > > > > the mix and eventually upgrade to. Hardware is a series of Ryzen 7 CPUs > > > > > (3700x). Migrations back and forth without issue for Ubuntu 20.x > > > > > kernels. The first Ubuntu 22 machine was on identical hardware and all > > > > > was good with that too. The second Ubuntu 22 based machine was spun up > > > > > with a newer gen Ryzen, a 5800x. On the initial kernel version that > > > > > came with that release back in April, migrations worked as expected > > > > > between hardware as well as different kernel versions and qemu / KVM > > > > > versions that come default with the distribution. Not sure if migrations > > > > > between kernel and KVM versions "accidentally" worked all these years, > > > > > but they did. However, we ran into an issue with the kernel > > > > > 5.15.0-33-generic (possibly with 5.15.0-30 as well) thats part of > > > > > Ubuntu. Migrations no longer worked to older generation CPUs. I could > > > > > send a guest TO the box and all was fine, but upon sending the guest to > > > > > another hypervisor, the sender would see it as successfully migrated, > > > > > but the VM would typically just hang, with 100% CPU utilization, or > > > > > sometimes crash. I tried a 5.18 kernel from May 22nd and again the > > > > > behavior is different. If I specify the CPU as EPYC or EPYC-IBPB, I can > > > > > migrate back and forth. > > > > perhaps you are hitting issue fixed by: > > > > https://lore.kernel.org/lkml/CAJ6HWG66HZ7raAa+YK0UOGLF+4O3JnzbZ+a-0j8GNixOhLk9dA@xxxxxxxxxxxxxx/T/ > > > > > > > Thanks for the response. I am not sure. > > I suspect Igor is right. PKRU/PKU, the offending XSAVE feature in that bug, is > > in the "new in 5800" list below, and that bug fix went into v5.17, i.e. should > > also be fixed in v5.18. > > > > Unfortunately, there's no Fixes: provided and I'm having a hell of a time trying > > to figure out when the bug was actually introduced. The v5.15 code base is quite > > different due to a rather massive FPU rework in v5.16. That fix definitely would > > not apply cleanly, but it doesn't mean that the underlying root cause is different, > > e.g. the buggy code could easily have been lurking for multiple kernel versions > > before the rework in v5.16. > > > That patch is from Feb. Would the bug have been introduced sometime in May to > > > the 5.15 kernel than Ubuntu 22 would have tracked ? > > Dates don't necessarily mean a whole lot when it comes to stable kernels, e.g. > > it's not uncommon for a change to be backported to a stable kernel weeks/months > > after it initially landed in the upstream tree. > > > > Is moving to v5.17 or later an option for you? If not, what was the "original" > > Ubuntu 22 kernel version that worked? Ideally, assuming it's the same FPU/PKU bug, > > the fix would be backported to v5.15, but that's likely going to be quite difficult, > > especially without knowing exactly which commit introduced the bug. > > Thanks Sean, I can, but it just means adjusting our work flow a bit. For our > hypervisors we like to just track LTS and be conservative in what software > we install and stick with apps and kernels designed specifically to work > with that release / distribution. Yeah, tracking LTS is the right thing to do. I'll try to verify and bisect the bug, and then get the fix backported to v5.15.y, but it may be a week or two before that happens. > The Ubuntu 22 kernel that worked back in April was 5.15.0-25-generic. TBH, > if I am told we were just lucky things worked with different hardware and > different kernels and KVM versions (ie. migrating bidirectionally from > ubuntu 20.x to 22.x) I would be fine with that too. But I was a little > surprised that a kernel version bump from 5.15 would break what was working. Migrating between kernel/KVM versions is absolutely supposed to work, this is firmly a kernel bug.