On Thu, 1 Dec 2022 at 11:00, Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote: > > On 2022/12/01 19:40, Peter Maydell wrote: > > On Thu, 1 Dec 2022 at 10:27, Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote: > >> > >> A register access error typically means something seriously wrong > >> happened so that anything bad can happen after that and recovery is > >> impossible. > >> Even failing one register access is catastorophic as > >> architecture-specific code are not written so that it torelates such > >> failures. > >> > >> Make sure the VM stop and nothing worse happens if such an error occurs. > >> > >> Signed-off-by: Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> > > > > In a similar vein there was also > > https://lore.kernel.org/all/20220617144857.34189-1-peterx@xxxxxxxxxx/ > > back in June, which on the one hand was less comprehensive but on > > the other does the plumbing to pass the error upwards rather than > > reporting it immediately at point of failure. > > > > I'm in principle in favour but suspect we'll run into some corner > > cases where we were happily ignoring not-very-important failures > > (eg if you're running Linux as the host OS on a Mac M1 and your > > host kernel doesn't have this fix: > > https://lore.kernel.org/all/YnHz6Cw5ONR2e+KA@xxxxxxxxxx/T/ > > then QEMU will go from "works by sheer luck" to "consistently > > hits this error check"). So we should aim to land this extra > > error checking early in the release cycle so we have plenty of > > time to deal with any bug reports we get about it. > Actually I found this problem when I tried to run QEMU with KVM on M2 > MacBook Air and encountered a failure described and fixed at: > https://lore.kernel.org/all/20221201104914.28944-2-akihiko.odaki@xxxxxxxxxx/ Ah, yeah, you're trying to run QEMU+KVM on a heterogenous cluster. You need to force all the vCPUs to run on only a single host CPU type. It's a shame the error-reporting for this situation is not very good, but there's not really any way to tell in advance, the best you get is an error at the point where a vCPU happens to migrate over to a different host CPU. > Although the affected register was not really important, QEMU couldn't > run the guest well enough because kvm_arch_put_registers for ARM64 is > written in a way that it fails early. I guess the situation is not so > different for other architectures as well. I think Arm is the only one that does this kind of "leave the handling of the system registers up to the host kernel and treat them as mostly black-box values to be passed around on migration" approach. Most other architectures have QEMU know about specific system registers in the vCPU and only ask the kernel about those, I think. -- PMM