On Wed, 13 Jan 2021 12:17:19 -0800 Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > On Tue, Jan 12, 2021, Zdenek Kaspar wrote: > > On Tue, 22 Dec 2020 22:26:45 +0100 > > Zdenek Kaspar <zkaspar82@xxxxxxxxx> wrote: > > > > > On Tue, 22 Dec 2020 09:07:39 -0800 > > > Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > > > > On Mon, Dec 21, 2020, Zdenek Kaspar wrote: > > > > > [ 179.364305] WARNING: CPU: 0 PID: 369 at > > > > > kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm] [ 179.365415] > > > > > Call Trace: [ 179.365443] paging64_page_fault+0x244/0x8e0 > > > > > [kvm] > > > > > > > > This means the shadow page zapping is occuring because KVM is > > > > hitting the max number of allowed MMU shadow pages. Can you > > > > provide your QEMU command line? I can reproduce the performance > > > > degredation, but only by deliberately overriding the max number > > > > of MMU pages via `-machine kvm-shadow-mem` to be an absurdly > > > > low value. > > > > > > > > > [ 179.365596] kvm_mmu_page_fault+0x376/0x550 [kvm] > > > > > [ 179.365725] kvm_arch_vcpu_ioctl_run+0xbaf/0x18f0 [kvm] > > > > > [ 179.365772] kvm_vcpu_ioctl+0x203/0x520 [kvm] > > > > > [ 179.365938] __x64_sys_ioctl+0x338/0x720 > > > > > [ 179.365992] do_syscall_64+0x33/0x40 > > > > > [ 179.366013] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > It's one long line, added "\" for mail readability: > > > > > > qemu-system-x86_64 -machine type=q35,accel=kvm \ > > > -cpu host,host-cache-info=on -smp cpus=2,cores=2 \ > > > -m size=1024 -global virtio-pci.disable-legacy=on \ > > > -global virtio-pci.disable-modern=off \ > > > -device virtio-balloon \ > > > -device virtio-net,netdev=tap-build,mac=DE:AD:BE:EF:00:80 \ > > > -object rng-random,filename=/dev/urandom,id=rng0 \ > > > -device virtio-rng,rng=rng0 \ > > > -name build,process=qemu-build \ > > > -drive > > > file=/mnt/data/export/unix/kvm/build/openbsd-amd64.img,if=virtio,cache=none,format=raw,aio=native > > > \ -netdev type=tap,id=tap-build,vhost=on \ > > > -serial none \ > > > -parallel none \ -monitor > > > unix:/dev/shm/kvm-build.sock,server,nowait \ -enable-kvm > > > -daemonize -runas qemu > > > > > > Z. > > > > BTW, v5.11-rc3 with kvm-shadow-mem=1073741824 it seems OK. > > > > Just curious what v5.8 does > > Aha! Figured it out. v5.9 (the commit you bisected to) broke the > zapping, that's what it did. The list of MMU pages is a FIFO list, > meaning KVM adds entries to the head, not the tail. I botched the > zapping flow and used for_each instead of for_each_reverse, which > meant KVM would zap the _newest_ pages instead of the _oldest_ pages. > So once a VM hit its limit, KVM would constantly zap the shadow > pages it just allocated. > > This should resolve the performance regression, or at least make it > far less painful. It's possible you may still see some performance > degredation due to other changes in the the zapping, e.g. more > aggressive recursive zapping. If that's the case, I can explore > other tweaks, e.g. skip higher levels when possible. I'll get a > proper patch posted later today. > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index c478904af518..2c6e6fdb26ad 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -2417,7 +2417,7 @@ static unsigned long > kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm, return 0; > > restart: > - list_for_each_entry_safe(sp, tmp, > &kvm->arch.active_mmu_pages, link) { > + list_for_each_entry_safe_reverse(sp, tmp, > &kvm->arch.active_mmu_pages, link) { /* > * Don't zap active root pages, the page itself can't > be freed > * and zapping it will just force vCPUs to realloc > and reload. > > Side topic, I still can't figure out how on earth your guest kernel > is hitting the max number of default pages. Even with large pages > completely disabled, PTI enabled, multiple guest processes running, > etc... I hit OOM in the guest before the host's shadow page limit > kicks in. I had to force the limit down to 25% of the default to > reproduce the bad behavior. All I can figure is that BSD has a > substantially different paging scheme than Linux. > > > so by any chance is there command for kvm-shadow-mem value via qemu > > monitor? > > > > Z. Cool, tested by quick compile in guest and it's a good fix! 5.11.0-rc3-amd64 (list_for_each_entry_safe): - with kvm-shadow-mem=1073741824 (without == unusable) 0m14.86s real 0m10.87s user 0m12.15s system 5.11.0-rc3-2-amd64 (list_for_each_entry_safe_reverse): 0m14.36s real 0m10.50s user 0m12.43s system Thanks, Z.