On Tue, Jan 12, 2021, Zdenek Kaspar wrote: > On Tue, 22 Dec 2020 22:26:45 +0100 > Zdenek Kaspar <zkaspar82@xxxxxxxxx> wrote: > > > On Tue, 22 Dec 2020 09:07:39 -0800 > > Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > > On Mon, Dec 21, 2020, Zdenek Kaspar wrote: > > > > [ 179.364305] WARNING: CPU: 0 PID: 369 at > > > > kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm] [ 179.365415] Call > > > > Trace: [ 179.365443] paging64_page_fault+0x244/0x8e0 [kvm] > > > > > > This means the shadow page zapping is occuring because KVM is > > > hitting the max number of allowed MMU shadow pages. Can you > > > provide your QEMU command line? I can reproduce the performance > > > degredation, but only by deliberately overriding the max number of > > > MMU pages via `-machine kvm-shadow-mem` to be an absurdly low value. > > > > > > > [ 179.365596] kvm_mmu_page_fault+0x376/0x550 [kvm] > > > > [ 179.365725] kvm_arch_vcpu_ioctl_run+0xbaf/0x18f0 [kvm] > > > > [ 179.365772] kvm_vcpu_ioctl+0x203/0x520 [kvm] > > > > [ 179.365938] __x64_sys_ioctl+0x338/0x720 > > > > [ 179.365992] do_syscall_64+0x33/0x40 > > > > [ 179.366013] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > It's one long line, added "\" for mail readability: > > > > qemu-system-x86_64 -machine type=q35,accel=kvm \ > > -cpu host,host-cache-info=on -smp cpus=2,cores=2 \ > > -m size=1024 -global virtio-pci.disable-legacy=on \ > > -global virtio-pci.disable-modern=off \ > > -device virtio-balloon \ > > -device virtio-net,netdev=tap-build,mac=DE:AD:BE:EF:00:80 \ > > -object rng-random,filename=/dev/urandom,id=rng0 \ > > -device virtio-rng,rng=rng0 \ > > -name build,process=qemu-build \ > > -drive > > file=/mnt/data/export/unix/kvm/build/openbsd-amd64.img,if=virtio,cache=none,format=raw,aio=native > > \ -netdev type=tap,id=tap-build,vhost=on \ -serial > > none \ -parallel none > > \ -monitor > > unix:/dev/shm/kvm-build.sock,server,nowait \ -enable-kvm > > -daemonize -runas qemu > > > > Z. > > BTW, v5.11-rc3 with kvm-shadow-mem=1073741824 it seems OK. > > Just curious what v5.8 does Aha! Figured it out. v5.9 (the commit you bisected to) broke the zapping, that's what it did. The list of MMU pages is a FIFO list, meaning KVM adds entries to the head, not the tail. I botched the zapping flow and used for_each instead of for_each_reverse, which meant KVM would zap the _newest_ pages instead of the _oldest_ pages. So once a VM hit its limit, KVM would constantly zap the shadow pages it just allocated. This should resolve the performance regression, or at least make it far less painful. It's possible you may still see some performance degredation due to other changes in the the zapping, e.g. more aggressive recursive zapping. If that's the case, I can explore other tweaks, e.g. skip higher levels when possible. I'll get a proper patch posted later today. diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index c478904af518..2c6e6fdb26ad 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2417,7 +2417,7 @@ static unsigned long kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm, return 0; restart: - list_for_each_entry_safe(sp, tmp, &kvm->arch.active_mmu_pages, link) { + list_for_each_entry_safe_reverse(sp, tmp, &kvm->arch.active_mmu_pages, link) { /* * Don't zap active root pages, the page itself can't be freed * and zapping it will just force vCPUs to realloc and reload. Side topic, I still can't figure out how on earth your guest kernel is hitting the max number of default pages. Even with large pages completely disabled, PTI enabled, multiple guest processes running, etc... I hit OOM in the guest before the host's shadow page limit kicks in. I had to force the limit down to 25% of the default to reproduce the bad behavior. All I can figure is that BSD has a substantially different paging scheme than Linux. > so by any chance is there command for kvm-shadow-mem value via qemu monitor? > > Z.