https://bugzilla.kernel.org/show_bug.cgi?id=199727 Bug ID: 199727 Summary: CPU freezes on KVM guests during high IO load on host Product: Virtualization Version: unspecified Kernel Version: 3.x, 4.2, 4.4, 4.10 Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: kvm Assignee: virtualization_kvm@xxxxxxxxxxxxxxxxxxxx Reporter: gkovacs@xxxxxxxxx Regression: No Proxmox is a Debian based virtualization distribution with an Ubuntu LTS based kernel. When there is high IO load on Proxmox v4 and v5 virtualization hosts during vzdump backups, restores, migrations or simply reading/writing of huge files on the same storage where the KVM guests are stored, these guests show the following symptoms: - CPU soft lockups - rcu_sched detected stalls - task blocked, stack dump - huge latency in network services (even pings time out for several seconds) - lost network connectivity (Windows guests often lose Remote Desktop connections) The issue affects KVM guests with VirtIO, VirtIO SCSI and IDE disks, with different guest error messages. This issue affects Windows, Debian 7/8 guests the worst, Debian 9 and Ubuntus are a bit less sensitive. The issue affects many hardware configurations: we have tested and found it on single and dual socket Westmere, Sandy Bridge and Ivy Bridge Core i7 and Xeon based systems. The issue is present on many local storage setups, regardless of HDD or SSD used, was confirmed on below configurations: - LVM / ext4 with qCOW2 guests (on ICH and Adaptec connected single HDD, Adaptec HW mirror HDD and Adaptec HW RAID10 HDD tested) - ZFS with qCOW2 or zVOL guests (on ICH and Adaptec connected single HDD & SSD, ZFS mirror & RAID10 & RAIDZ1 HDD & SSD tested) REPRODUCTION 1. Install Proxmox 4 or 5 on bare metal (ZFS or LVM+ext4, HDD or SSD, single disk or array) 2. Create Windows and Debian 7 or 8 KVM guests on local storage (with IDE or VirtIO disks, VirtIO network) 3. Start actively polling guest network services from network (ping, Apache load test, Remote Desktop, etc.) and observe guest consoles 4. Start backing up VMs with the built-in backup function to same local storage or NFS share on network 5. Restore VM backups from local storage or NFS share on network (or simply copy huge files to local storage from external disk or network) During the backup and restore operations, KVM guests will show the symptoms above. MITIGATION If vm.dirty_ratio and vm.dirty_background_ratio are set to very low values on the host (2 and 1), the problem is somewhat less severe. LINKS Many users confirmed this issue on different platforms (ZFS+zvol, ZFS+QCOW2, ext4+LVM) over the past few years: https://forum.proxmox.com/threads/kvm-guests-freeze-hung-tasks-during-backup-restore-migrate.34362/ https://forum.proxmox.com/threads/virtio-task-xxx-blocked-for-more-than-120-seconds.25835/ https://forum.proxmox.com/threads/frequent-cpu-stalls-in-kvm-guests-during-high-io-on-host.30702/ We also filed a bugreport in the Proxmox bugzilla, but this bug is most likely in QEMU/KVM: https://bugzilla.proxmox.com/show_bug.cgi?id=1453 -- You are receiving this mail because: You are watching the assignee of the bug.