On Thu, Aug 22, 2019 at 9:23 AM Wido den Hollander <wido@xxxxxxxx> wrote: > > Hi, > > In a couple of situations I have encountered that Virtual Machines > running on RBD had a high I/O-wait, nearly 100%, on their vdX (VirtIO) > or sdX (Virtio-SCSI) devices while they were performing CPU intensive tasks. > > These servers would be running a very CPU intensive application while > *not* doing that many disk I/O. > > I however noticed that the I/O-wait of the disk(s) in the VM went up to > 100%. > > This VM is CPU limited by Libvirt by putting that KVM process in it's > own cgroup with a CPU limitation. > > Now, my theory is: > > KVM (qemu-kvm) is completely userspace and librbd runs inside qemu-kvm > as a library. All threads for disk I/O are part of the same PID and thus > part of that cgroup. > > If a process inside the Virtual Machine now starts to consume all CPU > time there is nothing left for librbd which slows it down. > > This then causes a increased I/O-wait inside the Virtual Machine. Even > though the VM is not performing a lot of disk I/O. The wait of the I/O > goes up due to this. > > > Is my theory sane? Yes, I would say that your theory is sane. Have you looked into libvirt's cgroup controls for limiting the emulator portion vs the vCPUs [1]? I'd hope the librbd code and threads should be running in the emulator cgroup (in a perfect world). > Can somebody confirm this? > > Thanks, > > Wido > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [1] https://libvirt.org/cgroups.html -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com