On Mon, Jan 28, 2019 at 10:46:41PM +0100, Jens Sauer wrote: > Hey there, > > I am using a 4.14.y kernel on a host system based on Debian 9. It is a > custom kernel configuration which runs since February 2018 on the > 4.14.y series. This machine is used as qemu/kvm host server. > > Last week I upgraded the kernel from 4.14.76 to 4.14.96 and suddenly > one of the guest runs into segmentation fault errors. > It is a x86_64 guest running OPNsense 18.7.10_3 which is based on > FreeBSD 11.1-RELEASE-p18. > > The error appears directly after booting the guest. Sometimes there are > a "segmentation fault" errors in the boot log or checksum verification > fails during boot. The guest appears to be slow, the web UI does not > response in 90 % of the time. > TLS connections are failing from the guest to any remote hosts, the > errors were always caused by failing signature verification of the > remote host. > > After a few minutes the FreeBSD kernel logs: "HBSD SEGVGUARD suspension > expired python2.7" or "php-cgi". > > At no time I could see any errors on the host. > > I am not sure if this problem does appear in any of my linux based > guest. I shut down the linux guests as soon as I noticed the errors in > the FreeBSD guest. > I was afraid of a bad memory stick, there were no CE/UE reported by the > MC on the host. I ran memtest twice, which resulted in no error. > > I then made a bisect between tag v4.14.76 and v4.14.96 which identified > commit 4124a4cff344abbf8187775eb643d9827830e715 > as the first bad commit. > > Please let me know if you need more information or how I can help you > to track down the issue. I hope this report is sufficient, it is my > first bug report for the kernel. Does the attached patch resolve your issues? I'm fairly certain it's correct, but AFAIK none of the original reporters has confirmed the fix.
>From d07c20a3caf348d3e9e83ccf60f3ceffa9d87e4a Mon Sep 17 00:00:00 2001 From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> Date: Mon, 28 Jan 2019 12:07:51 -0800 Subject: [PATCH] KVM: x86: Fix a 4.14 backport regression related to userspace/guest FPU MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Upstream commit: f775b13eedee ("x86,kvm: move qemu/guest FPU switching out to vcpu_run") introduced a bug, which was later fixed by upstream commit: 5663d8f9bbe4 ("kvm: x86: fix WARN due to uninitialized guest FPU state") For reasons unknown, both commits were initially passed-over for inclusion in the 4.14 stable branch despite being tagged for stable. Eventually, someone noticed that the fixup, commit 5663d8f9bbe4, was missing from stable[1], and so it was queued up for 4.14 and included in release v4.14.79. Even later, the original buggy patch, commit f775b13eedee, was also applied to the 4.14 stable branch. Through an unlucky coincidence, the incorrect ordering did not generate a conflict between the two patches, and led to v4.14.94 and later releases containing a spurious call to kvm_load_guest_fpu() in kvm_arch_vcpu_ioctl_run(). As a result, KVM may reload stale guest FPU state, e.g. after accepting in INIT event. This can manifest as crashes during boot, segfaults, failed checksums and so on and so forth. Remove the unwanted kvm_{load,put}_guest_fpu() calls, i.e. make kvm_arch_vcpu_ioctl_run() look like commit 5663d8f9bbe4 was backported after commit f775b13eedee. [1] https://www.spinics.net/lists/stable/msg263931.html Fixes: 4124a4cff344 ("x86,kvm: move qemu/guest FPU switching out to vcpu_run") Cc: stable@xxxxxxxxxxxxxxx Cc: Sasha Levin <sashal@xxxxxxxxxx> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> Cc: Peter Xu <peterx@xxxxxxxxxx> Cc: Rik van Riel <riel@xxxxxxxxxx> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> Reported-by: Roman Mamedov Reported-by: Thomas Lindroth <thomas.lindroth@xxxxxxxxx> Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> --- arch/x86/kvm/x86.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 130be2efafbe..af7ab2c71786 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7423,14 +7423,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) } } - kvm_load_guest_fpu(vcpu); - if (unlikely(vcpu->arch.complete_userspace_io)) { int (*cui)(struct kvm_vcpu *) = vcpu->arch.complete_userspace_io; vcpu->arch.complete_userspace_io = NULL; r = cui(vcpu); if (r <= 0) - goto out_fpu; + goto out; } else WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed); @@ -7439,8 +7437,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) else r = vcpu_run(vcpu); -out_fpu: - kvm_put_guest_fpu(vcpu); out: kvm_put_guest_fpu(vcpu); post_kvm_run_save(vcpu); -- 2.20.1