[Bug 199727] New: CPU freezes on KVM guests during high IO load on host

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=199727

            Bug ID: 199727
           Summary: CPU freezes on KVM guests during high IO load on host
           Product: Virtualization
           Version: unspecified
    Kernel Version: 3.x, 4.2, 4.4, 4.10
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: kvm
          Assignee: virtualization_kvm@xxxxxxxxxxxxxxxxxxxx
          Reporter: gkovacs@xxxxxxxxx
        Regression: No

Proxmox is a Debian based virtualization distribution with an Ubuntu LTS based
kernel.

When there is high IO load on Proxmox v4 and v5 virtualization hosts during
vzdump backups, restores, migrations or simply reading/writing of huge files on
the same storage where the KVM guests are stored, these guests show the
following symptoms:

- CPU soft lockups
- rcu_sched detected stalls
- task blocked, stack dump
- huge latency in network services (even pings time out for several seconds)
- lost network connectivity (Windows guests often lose Remote Desktop
connections)

The issue affects KVM guests with VirtIO, VirtIO SCSI and IDE disks, with
different guest error messages. This issue affects Windows, Debian 7/8 guests
the worst, Debian 9 and Ubuntus are a bit less sensitive.

The issue affects many hardware configurations: we have tested and found it on
single and dual socket Westmere, Sandy Bridge and Ivy Bridge Core i7 and Xeon
based systems.

The issue is present on many local storage setups, regardless of HDD or SSD
used, was confirmed on below configurations:
- LVM / ext4 with qCOW2 guests (on ICH and Adaptec connected single HDD,
Adaptec HW mirror HDD and Adaptec HW RAID10 HDD tested)
- ZFS with qCOW2 or zVOL guests (on ICH and Adaptec connected single HDD & SSD,
ZFS mirror & RAID10 & RAIDZ1 HDD & SSD tested) 


REPRODUCTION
1. Install Proxmox 4 or 5 on bare metal (ZFS or LVM+ext4, HDD or SSD, single
disk or array)
2. Create Windows and Debian 7 or 8 KVM guests on local storage (with IDE or
VirtIO disks, VirtIO network)
3. Start actively polling guest network services from network (ping, Apache
load test, Remote Desktop, etc.) and observe guest consoles
4. Start backing up VMs with the built-in backup function to same local storage
or NFS share on network
5. Restore VM backups from local storage or NFS share on network (or simply
copy huge files to local storage from external disk or network)

During the backup and restore operations, KVM guests will show the symptoms
above.


MITIGATION
If vm.dirty_ratio and vm.dirty_background_ratio are set to very low values on
the host (2 and 1), the problem is somewhat less severe.


LINKS
Many users confirmed this issue on different platforms (ZFS+zvol, ZFS+QCOW2,
ext4+LVM) over the past few years:
https://forum.proxmox.com/threads/kvm-guests-freeze-hung-tasks-during-backup-restore-migrate.34362/
https://forum.proxmox.com/threads/virtio-task-xxx-blocked-for-more-than-120-seconds.25835/
https://forum.proxmox.com/threads/frequent-cpu-stalls-in-kvm-guests-during-high-io-on-host.30702/

We also filed a bugreport in the Proxmox bugzilla, but this bug is most likely
in QEMU/KVM:
https://bugzilla.proxmox.com/show_bug.cgi?id=1453

-- 
You are receiving this mail because:
You are watching the assignee of the bug.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux