Re: Regression in v4.14.94 by "x86,kvm: move qemu/guest FPU switching out to vcpu_run"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 28, 2019 at 08:25:20PM +0100, Thomas Lindroth wrote:
> I run a qemu/kvm VM with debian and I've started getting segfaults and failing checksums on
> downloaded files. The failures are undeterministic and similar to the failures you get with
> bad ram. I tried to diagnose the problem with various testing tools and found that
> "stress-ng --verify --cpu 1" always give an error. Stress-ng give one of these errors
> usually within 60 sec:
> 
>   stress-ng-cpu: Newton-Rapshon sqrt not accurate enough
>   stress-ng-cpu: prime error detected, number of primes between 0 and 1000000 miscalculated
> 
> Nothing relevant has changed recently in the VM but the host kernel was upgraded from
> 4.14.93 to 4.14.96. I can't reproduce the stress-ng error with a 4.14.93 host kernel. There
> is only one kvm related change in that range so I tried to revert that one.
> 
> By reverting commit 4124a4cff344abbf8187775eb643d9827830e715
> "x86,kvm: move qemu/guest FPU switching out to vcpu_run" on kernel 4.14.96 I can't reproduce
> the stress-ng error and I have no segfault or other problems with the guest.

This is the second report of this issue:

    https://bugzilla.kernel.org/show_bug.cgi?id=202419

Upon inspection, the commit in question is obviously buggy,
kvm_arch_vcpu_ioctl_run() doubles up on kvm_{load,put}_guest_fpu().

The ordering of mainline commits:

    f775b13eedee ("x86,kvm: move qemu/guest FPU switching out to vcpu_run")

and

    5663d8f9bbe4 ("kvm: x86: fix WARN due to uninitialized guest FPU state")

were reversed when backported to 4.14.  Commit 5663d8f9bbe4 even explicitly
notes that it fixes f775b13eedee.  I'll send a patch.

> 
> The commit was originally introduced in v4.15-rc3 (Nov 14 2017) and was only recently
> backported to 4.14. The other stable kernels before 4.14 didn't get any backport so it looks
> like a broken 4.14 backport. That backport also cause problems for other people.
> https://bugzilla.kernel.org/show_bug.cgi?id=202419
> 
> I've rebooted between the different kernels and rebooted the VM enough to be reasonably sure
> that commit is the problem. Stress-ng never lasts more than 10 min with that commit but works
> for hours without it.
> 
> Steps to reproduce would be to create a qemu/kvm VM with debian stretch, install stress-ng
> version 0.07.16 and run "stress-ng --verify --cpu 1".
> 
> Here is the qemu-3.1.0 commandline generated by libvirt:
> /usr/bin/qemu-system-x86_64 -name guest=debian,debug-threads=on -S -object
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-debian/master-key.aes
> -machine pc-i440fx-2.4,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-noTSX -m 2048
> -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid
> 0473ded4-d417-4b0e-a4f5-36ba5a2cd675 -no-user-config -nodefaults -chardev
> socket,id=charmonitor,fd=21,server,nowait -mon chardev=charmonitor,id=monitor,mode=control
> -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown
> -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on
> -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device
> ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device
> ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device
> ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -drive
> if=none,id=drive-ide0-0-1,readonly=on -device
> ide-cd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1,bootindex=2 -drive
> file=/mnt/gemini.61rn.3T/Backups/debian.raw,format=raw,if=none,id=drive-virtio-disk0 -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
> -netdev tap,fd=23,id=hostnet0 -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=00:11:22:33:44:55,bus=pci.0,addr=0x3 -spice
> port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device
> VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device AC97,id=sound0,bus=pci.0,addr=0x7
> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -object
> rng-random,id=objrng0,filename=/dev/random -device
> virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox
> on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
> 
> My host kernel .config is big so I put it in a paste: http://sprunge.us/u7YNBt



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux