On Tue, Sep 10, 2019 at 2:32 PM Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > On Sun, Sep 08, 2019 at 06:37:43AM -0400, James Harvey wrote: > > Host is up to date Arch Linux, with exception of downgrading linux to > > track this down to 5.2.11 - 5.2.13. QEMU 4.1.0, but have also > > downgraded to 4.0.0 to confirm no change. > > > > Host is dual E5-2690 v1 Xeons. With hyperthreading, 32 logical cores. > > I've always been able to boot qemu with "-smp > > cpus=30,cores=15,threads=1,sockets=2". I leave 2 free for host > > responsiveness. > > > > Upgrading from 5.2.10 to 5.2.11 causes the VM to lock up while loading > > the initramfs about 90-95% of the time. (Probably a slight race > > condition.) On host, QEMU shows as nVmCPUs*100% CPU usage, so around > > 3000% for 30 cpus. > > > > If I back down to "cpus=16,cores=8", it always boots. If I increase > > to "cpus=18,cores=9", it goes back to locking up 90-95% of the time. > > > > Omitting "-accel=kvm" allows 5.2.11 to work on the host without issue, > > so combined with that the only package needing to be downgraded is > > linux to 5.2.10 to prevent the issue with KVM, I think this must be a > > KVM issue. > > > > Using version of QEMU with debug symbols gives: > > * gdb backtrace: http://ix.io/1UyO > > Fudge. > > One of the threads is deleting a memory region, and v5.2.11 reverted a > change related to flushing sptes on memory region deletion. > > Can you try reverting the following commit? Reverting the revert isn't a > viable solution, but it'll at least be helpful to confirm this it's the > source of your troubles. > > commit 2ad350fb4c924f611d174e2b0da4edba8a6e430a > Author: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Date: Thu Aug 15 09:43:32 2019 +0200 > > Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot" > > commit d012a06ab1d23178fc6856d8d2161fbcc4dd8ebd upstream. > > This reverts commit 4e103134b862314dc2f2f18f2fb0ab972adc3f5f. > Alex Williamson reported regressions with device assignment with > this patch. Even though the bug is probably elsewhere and still > latent, this is needed to fix the regression. > > Fixes: 4e103134b862 ("KVM: x86/mmu: Zap only the relevant pages when removing a memslot", 2019-02-05) > Reported-by: Alex Willamson <alex.williamson@xxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Cc: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> Yes, confirmed reverting this commit (to restore the originally reverted commit) fixes the issue. I'm really surprised to have not found similar reports, especially of Arch users which had 5.2.11 put into the repos on Aug 29. Makes me wonder if it's reproducible on all hardware using host hyperthreading and giving a VM > nproc/2 virtual cpus. In the meantime, what should go into distro decisions on whether to revert? Since you mentioned: "Reverting the revert isn't a viable solution."