https://bugzilla.kernel.org/show_bug.cgi?id=153571 Bug ID: 153571 Summary: 100% CPU usage on guest VM Product: Virtualization Version: unspecified Kernel Version: 3.10.0-327.22.2.el7.x86_64 Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: kvm Assignee: virtualization_kvm@xxxxxxxxxxxxxxxxxxxx Reporter: laurentiu@xxxxxxxx Regression: No Hardware details: Baremetal: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz 128 GB RAM L1 guest: 36 vCPUs 100 GB RAM Software details: Baremetal: CentOS Linux release 7.2.1511 (Core) qemu-kvm 2.3.0-31.el7.16.1 Nested KVM enabled VM: CentOS Linux release 7.2.1511 (Core) On this VMs I have 15 L2 guests running. Problem description: After a few days of running (more than 90% idle on both baremetal and compute), the compute node goes to 100% CPU usage (3600 %). The L2 guests are not accessible anymore. The only workaround is to shutdown the L1 guest and start it again. a restart on the L1 guest isn't enough. Running perf record -a -g on the baremetal shows that most of the CPU time is in _raw_spin_lock Children Self Command Shared Object Symbol - 93.62% 93.62% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock - _raw_spin_lock + 45.30% kvm_mmu_sync_roots + 28.49% kvm_mmu_load + 25.00% mmu_free_roots + 1.12% tdp_page_fault When the CPU goes to 100%: - the reported free memory is close to 0 on the host (around 300 MB). Anyway there are about 50 GB as cached/buffered memory. - the Resident Memory on the host for the L1 VM is around 50-60 GB - the free memory inside L1 VM is around 60 GB - there is no swapping activity on the host (around 150 MB of used swap). the swap is disabled on the L1 guest. qemu command line: /usr/libexec/qemu-kvm -name baremetalbrbm_1 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -cpu host -m 102400 -realtime mlock=off -smp 36,sockets=36,cores=1,threads=1 -uuid 534e9b54-5e4c-4acb-adcf-793f841551a7 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-baremetalbrbm_1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot menu=off,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device ahci,id=sata0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/baremetalbrbm_1.qcow2,if=none,id=drive-sata0-0-0,format=qcow2,cache=unsafe -device ide-hd,bus=sata0.0,drive=drive-sata0-0-0,id=sata0-0-0,bootindex=1 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:f1:15:20:c5:46,bus=pci.0,addr=0x3 -netdev tap,fd=28,id=hostnet1 -device rtl8139,netdev=hostnet1,id=net1,mac=52:54:00:d3:c9:24,bus=pci.0,addr=0x7 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:2 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html