Re: Theory: High I/O-wait inside VM with RBD due to CPU throttling

Jason Dillaman <jdillama@xxxxxxxxxx> · Thu, 22 Aug 2019 09:59:04 -0400

On Thu, Aug 22, 2019 at 9:23 AM Wido den Hollander <wido@xxxxxxxx> wrote:
>
> Hi,
>
> In a couple of situations I have encountered that Virtual Machines
> running on RBD had a high I/O-wait, nearly 100%, on their vdX (VirtIO)
> or sdX (Virtio-SCSI) devices while they were performing CPU intensive tasks.
>
> These servers would be running a very CPU intensive application while
> *not* doing that many disk I/O.
>
> I however noticed that the I/O-wait of the disk(s) in the VM went up to
> 100%.
>
> This VM is CPU limited by Libvirt by putting that KVM process in it's
> own cgroup with a CPU limitation.
>
> Now, my theory is:
>
> KVM (qemu-kvm) is completely userspace and librbd runs inside qemu-kvm
> as a library. All threads for disk I/O are part of the same PID and thus
> part of that cgroup.
>
> If a process inside the Virtual Machine now starts to consume all CPU
> time there is nothing left for librbd which slows it down.
>
> This then causes a increased I/O-wait inside the Virtual Machine. Even
> though the VM is not performing a lot of disk I/O. The wait of the I/O
> goes up due to this.
>
>
> Is my theory sane?

Yes, I would say that your theory is sane. Have you looked into
libvirt's cgroup controls for limiting the emulator portion vs the
vCPUs [1]? I'd hope the librbd code and threads should be running in
the emulator cgroup (in a perfect world).

> Can somebody confirm this?
>
> Thanks,
>
> Wido
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] https://libvirt.org/cgroups.html

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com