Re: Guest migration between different Ryzen CPU generations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 03, 2022, mike tancsa wrote:
> On 6/2/2022 5:46 PM, Sean Christopherson wrote:
> > On Thu, Jun 02, 2022, mike tancsa wrote:
> > > On 6/2/2022 8:42 AM, Igor Mammedov wrote:
> > > > On Tue, 31 May 2022 13:00:07 -0400
> > > > mike tancsa <mike@xxxxxxxxxx> wrote:
> > > > 
> > > > > Hello,
> > > > > 
> > > > >        I have been using kvm since the Ubuntu 18 and 20.x LTS series of
> > > > > kernels and distributions without any issues on a whole range of Guests
> > > > > up until now. Recently, we spun up an Ubuntu LTS 22 hypervisor to add to
> > > > > the mix and eventually upgrade to. Hardware is a series of Ryzen 7 CPUs
> > > > > (3700x).  Migrations back and forth without issue for Ubuntu 20.x
> > > > > kernels.  The first Ubuntu 22 machine was on identical hardware and all
> > > > > was good with that too. The second Ubuntu 22 based machine was spun up
> > > > > with a newer gen Ryzen, a 5800x.  On the initial kernel version that
> > > > > came with that release back in April, migrations worked as expected
> > > > > between hardware as well as different kernel versions and qemu / KVM
> > > > > versions that come default with the distribution. Not sure if migrations
> > > > > between kernel and KVM versions "accidentally" worked all these years,
> > > > > but they did.  However, we ran into an issue with the kernel
> > > > > 5.15.0-33-generic (possibly with 5.15.0-30 as well) thats part of
> > > > > Ubuntu.  Migrations no longer worked to older generation CPUs.  I could
> > > > > send a guest TO the box and all was fine, but upon sending the guest to
> > > > > another hypervisor, the sender would see it as successfully migrated,
> > > > > but the VM would typically just hang, with 100% CPU utilization, or
> > > > > sometimes crash.  I tried a 5.18 kernel from May 22nd and again the
> > > > > behavior is different. If I specify the CPU as EPYC or EPYC-IBPB, I can
> > > > > migrate back and forth.
> > > > perhaps you are hitting issue fixed by:
> > > > https://lore.kernel.org/lkml/CAJ6HWG66HZ7raAa+YK0UOGLF+4O3JnzbZ+a-0j8GNixOhLk9dA@xxxxxxxxxxxxxx/T/
> > > > 
> > > Thanks for the response. I am not sure.
> > I suspect Igor is right.  PKRU/PKU, the offending XSAVE feature in that bug, is
> > in the "new in 5800" list below, and that bug fix went into v5.17, i.e. should
> > also be fixed in v5.18.
> > 
> > Unfortunately, there's no Fixes: provided and I'm having a hell of a time trying
> > to figure out when the bug was actually introduced.  The v5.15 code base is quite
> > different due to a rather massive FPU rework in v5.16.  That fix definitely would
> > not apply cleanly, but it doesn't mean that the underlying root cause is different,
> > e.g. the buggy code could easily have been lurking for multiple kernel versions
> > before the rework in v5.16.
> > > That patch is from Feb. Would the bug have been introduced sometime in May to
> > > the 5.15 kernel than Ubuntu 22 would have tracked ?
> > Dates don't necessarily mean a whole lot when it comes to stable kernels, e.g.
> > it's not uncommon for a change to be backported to a stable kernel weeks/months
> > after it initially landed in the upstream tree.
> > 
> > Is moving to v5.17 or later an option for you?  If not, what was the "original"
> > Ubuntu 22 kernel version that worked?  Ideally, assuming it's the same FPU/PKU bug,
> > the fix would be backported to v5.15, but that's likely going to be quite difficult,
> > especially without knowing exactly which commit introduced the bug.
> 
> Thanks Sean, I can, but it just means adjusting our work flow a bit. For our
> hypervisors we like to just track LTS and be conservative in what software
> we install and stick with apps and kernels designed specifically to work
> with that release / distribution.

Yeah, tracking LTS is the right thing to do.  I'll try to verify and bisect the bug,
and then get the fix backported to v5.15.y, but it may be a week or two before that
happens.

> The Ubuntu 22 kernel that worked back in April was 5.15.0-25-generic.  TBH,
> if I am told we were just lucky things worked with different hardware and
> different kernels and KVM versions (ie.  migrating bidirectionally from
> ubuntu 20.x to 22.x) I would be fine with that too.  But I was a little
> surprised that a kernel version bump from 5.15 would break what was working.

Migrating between kernel/KVM versions is absolutely supposed to work, this is
firmly a kernel bug.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux