Re: Moved FPU switching causes guest segfaults

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 28, 2019 at 10:46:41PM +0100, Jens Sauer wrote:
> Hey there,
> 
> I am using a 4.14.y kernel on a host system based on Debian 9. It is a
> custom kernel configuration which runs since February 2018 on the
> 4.14.y series. This machine is used as qemu/kvm host server.
> 
> Last week I upgraded the kernel from 4.14.76 to 4.14.96 and suddenly
> one of the guest runs into segmentation fault errors.
> It is a x86_64 guest running OPNsense 18.7.10_3 which is based on
> FreeBSD 11.1-RELEASE-p18.
> 
> The error appears directly after booting the guest. Sometimes there are
> a "segmentation fault" errors in the boot log or checksum verification
> fails during boot. The guest appears to be slow, the web UI does not
> response in 90 % of the time.
> TLS connections are failing from the guest to any remote hosts, the
> errors were always caused by failing signature verification of the
> remote host.
> 
> After a few minutes the FreeBSD kernel logs: "HBSD SEGVGUARD suspension
> expired python2.7" or "php-cgi".
> 
> At no time I could see any errors on the host.
> 
> I am not sure if this problem does appear in any of my linux based
> guest. I shut down the linux guests as soon as I noticed the errors in
> the FreeBSD guest.
> I was afraid of a bad memory stick, there were no CE/UE reported by the
> MC on the host. I ran memtest twice, which resulted in no error.
> 
> I then made a bisect between tag v4.14.76 and v4.14.96 which identified
> commit 4124a4cff344abbf8187775eb643d9827830e715
> as the first bad commit.
> 
> Please let me know if you need more information or how I can help you
> to track down the issue. I hope this report is sufficient, it is my
> first bug report for the kernel.

Does the attached patch resolve your issues?  I'm fairly certain it's
correct, but AFAIK none of the original reporters has confirmed the fix.
>From d07c20a3caf348d3e9e83ccf60f3ceffa9d87e4a Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
Date: Mon, 28 Jan 2019 12:07:51 -0800
Subject: [PATCH] KVM: x86: Fix a 4.14 backport regression related to
 userspace/guest FPU
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Upstream commit:

    f775b13eedee ("x86,kvm: move qemu/guest FPU switching out to vcpu_run")

introduced a bug, which was later fixed by upstream commit:

    5663d8f9bbe4 ("kvm: x86: fix WARN due to uninitialized guest FPU state")

For reasons unknown, both commits were initially passed-over for
inclusion in the 4.14 stable branch despite being tagged for stable.
Eventually, someone noticed that the fixup, commit 5663d8f9bbe4, was
missing from stable[1], and so it was queued up for 4.14 and included in
release v4.14.79.

Even later, the original buggy patch, commit f775b13eedee, was also
applied to the 4.14 stable branch.  Through an unlucky coincidence, the
incorrect ordering did not generate a conflict between the two patches,
and led to v4.14.94 and later releases containing a spurious call to
kvm_load_guest_fpu() in kvm_arch_vcpu_ioctl_run().  As a result, KVM may
reload stale guest FPU state, e.g. after accepting in INIT event.  This
can manifest as crashes during boot, segfaults, failed checksums and so
on and so forth.

Remove the unwanted kvm_{load,put}_guest_fpu() calls, i.e. make
kvm_arch_vcpu_ioctl_run() look like commit 5663d8f9bbe4 was backported
after commit f775b13eedee.

[1] https://www.spinics.net/lists/stable/msg263931.html

Fixes: 4124a4cff344 ("x86,kvm: move qemu/guest FPU switching out to vcpu_run")
Cc: stable@xxxxxxxxxxxxxxx
Cc: Sasha Levin <sashal@xxxxxxxxxx>
Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Cc: Peter Xu <peterx@xxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx>
Reported-by: Roman Mamedov
Reported-by: Thomas Lindroth <thomas.lindroth@xxxxxxxxx>
Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
---
 arch/x86/kvm/x86.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 130be2efafbe..af7ab2c71786 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7423,14 +7423,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 		}
 	}
 
-	kvm_load_guest_fpu(vcpu);
-
 	if (unlikely(vcpu->arch.complete_userspace_io)) {
 		int (*cui)(struct kvm_vcpu *) = vcpu->arch.complete_userspace_io;
 		vcpu->arch.complete_userspace_io = NULL;
 		r = cui(vcpu);
 		if (r <= 0)
-			goto out_fpu;
+			goto out;
 	} else
 		WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed);
 
@@ -7439,8 +7437,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 	else
 		r = vcpu_run(vcpu);
 
-out_fpu:
-	kvm_put_guest_fpu(vcpu);
 out:
 	kvm_put_guest_fpu(vcpu);
 	post_kvm_run_save(vcpu);
-- 
2.20.1


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux