Re: [PATCH] accel/kvm/kvm-all: Handle register access errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 1 Dec 2022 at 11:00, Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote:
>
> On 2022/12/01 19:40, Peter Maydell wrote:
> > On Thu, 1 Dec 2022 at 10:27, Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote:
> >>
> >> A register access error typically means something seriously wrong
> >> happened so that anything bad can happen after that and recovery is
> >> impossible.
> >> Even failing one register access is catastorophic as
> >> architecture-specific code are not written so that it torelates such
> >> failures.
> >>
> >> Make sure the VM stop and nothing worse happens if such an error occurs.
> >>
> >> Signed-off-by: Akihiko Odaki <akihiko.odaki@xxxxxxxxxx>
> >
> > In a similar vein there was also
> > https://lore.kernel.org/all/20220617144857.34189-1-peterx@xxxxxxxxxx/
> > back in June, which on the one hand was less comprehensive but on
> > the other does the plumbing to pass the error upwards rather than
> > reporting it immediately at point of failure.
> >
> > I'm in principle in favour but suspect we'll run into some corner
> > cases where we were happily ignoring not-very-important failures
> > (eg if you're running Linux as the host OS on a Mac M1 and your
> > host kernel doesn't have this fix:
> > https://lore.kernel.org/all/YnHz6Cw5ONR2e+KA@xxxxxxxxxx/T/
> > then QEMU will go from "works by sheer luck" to "consistently
> > hits this error check"). So we should aim to land this extra
> > error checking early in the release cycle so we have plenty of
> > time to deal with any bug reports we get about it.

> Actually I found this problem when I tried to run QEMU with KVM on M2
> MacBook Air and encountered a failure described and fixed at:
> https://lore.kernel.org/all/20221201104914.28944-2-akihiko.odaki@xxxxxxxxxx/

Ah, yeah, you're trying to run QEMU+KVM on a heterogenous cluster.
You need to force all the vCPUs to run on only a single host
CPU type. It's a shame the error-reporting for this situation
is not very good, but there's not really any way to tell in
advance, the best you get is an error at the point where a vCPU
happens to migrate over to a different host CPU.

> Although the affected register was not really important, QEMU couldn't
> run the guest well enough because kvm_arch_put_registers for ARM64 is
> written in a way that it fails early. I guess the situation is not so
> different for other architectures as well.

I think Arm is the only one that does this kind of "leave the
handling of the system registers up to the host kernel and treat
them as mostly black-box values to be passed around on migration"
approach. Most other architectures have QEMU know about specific system
registers in the vCPU and only ask the kernel about those, I think.

-- PMM



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux