Re: Guest migration between different Ryzen CPU generations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/2/2022 5:46 PM, Sean Christopherson wrote:
On Thu, Jun 02, 2022, mike tancsa wrote:
On 6/2/2022 8:42 AM, Igor Mammedov wrote:
On Tue, 31 May 2022 13:00:07 -0400
mike tancsa <mike@xxxxxxxxxx> wrote:

Hello,

       I have been using kvm since the Ubuntu 18 and 20.x LTS series of
kernels and distributions without any issues on a whole range of Guests
up until now. Recently, we spun up an Ubuntu LTS 22 hypervisor to add to
the mix and eventually upgrade to. Hardware is a series of Ryzen 7 CPUs
(3700x).  Migrations back and forth without issue for Ubuntu 20.x
kernels.  The first Ubuntu 22 machine was on identical hardware and all
was good with that too. The second Ubuntu 22 based machine was spun up
with a newer gen Ryzen, a 5800x.  On the initial kernel version that
came with that release back in April, migrations worked as expected
between hardware as well as different kernel versions and qemu / KVM
versions that come default with the distribution. Not sure if migrations
between kernel and KVM versions "accidentally" worked all these years,
but they did.  However, we ran into an issue with the kernel
5.15.0-33-generic (possibly with 5.15.0-30 as well) thats part of
Ubuntu.  Migrations no longer worked to older generation CPUs.  I could
send a guest TO the box and all was fine, but upon sending the guest to
another hypervisor, the sender would see it as successfully migrated,
but the VM would typically just hang, with 100% CPU utilization, or
sometimes crash.  I tried a 5.18 kernel from May 22nd and again the
behavior is different. If I specify the CPU as EPYC or EPYC-IBPB, I can
migrate back and forth.
perhaps you are hitting issue fixed by:
https://lore.kernel.org/lkml/CAJ6HWG66HZ7raAa+YK0UOGLF+4O3JnzbZ+a-0j8GNixOhLk9dA@xxxxxxxxxxxxxx/T/

Thanks for the response. I am not sure.
I suspect Igor is right.  PKRU/PKU, the offending XSAVE feature in that bug, is
in the "new in 5800" list below, and that bug fix went into v5.17, i.e. should
also be fixed in v5.18.

Unfortunately, there's no Fixes: provided and I'm having a hell of a time trying
to figure out when the bug was actually introduced.  The v5.15 code base is quite
different due to a rather massive FPU rework in v5.16.  That fix definitely would
not apply cleanly, but it doesn't mean that the underlying root cause is different,
e.g. the buggy code could easily have been lurking for multiple kernel versions
before the rework in v5.16.
That patch is from Feb. Would the bug have been introduced sometime in May to
the 5.15 kernel than Ubuntu 22 would have tracked ?
Dates don't necessarily mean a whole lot when it comes to stable kernels, e.g.
it's not uncommon for a change to be backported to a stable kernel weeks/months
after it initially landed in the upstream tree.

Is moving to v5.17 or later an option for you?  If not, what was the "original"
Ubuntu 22 kernel version that worked?  Ideally, assuming it's the same FPU/PKU bug,
the fix would be backported to v5.15, but that's likely going to be quite difficult,
especially without knowing exactly which commit introduced the bug.

Thanks Sean, I can, but it just means adjusting our work flow a bit. For our hypervisors we like to just track LTS and be conservative in what software we install and stick with apps and kernels designed specifically to work with that release / distribution. The Ubuntu 22 kernel that worked back in April was 5.15.0-25-generic.  TBH, if I am told we were just lucky things worked with different hardware and different kernels and KVM versions (ie. migrating bidirectionally from ubuntu 20.x to 22.x) I would be fine with that too.  But I was a little surprised that a kernel version bump from 5.15 would break what was working.

    ---Mike




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux