Re: Theory: High I/O-wait inside VM with RBD due to CPU throttling

Jason Dillaman <jdillama@xxxxxxxxxx> · Thu, 22 Aug 2019 11:49:39 -0400

On Thu, Aug 22, 2019 at 11:29 AM Wido den Hollander <wido@xxxxxxxx> wrote:
>
>
>
> On 8/22/19 3:59 PM, Jason Dillaman wrote:
> > On Thu, Aug 22, 2019 at 9:23 AM Wido den Hollander <wido@xxxxxxxx> wrote:
> >>
> >> Hi,
> >>
> >> In a couple of situations I have encountered that Virtual Machines
> >> running on RBD had a high I/O-wait, nearly 100%, on their vdX (VirtIO)
> >> or sdX (Virtio-SCSI) devices while they were performing CPU intensive tasks.
> >>
> >> These servers would be running a very CPU intensive application while
> >> *not* doing that many disk I/O.
> >>
> >> I however noticed that the I/O-wait of the disk(s) in the VM went up to
> >> 100%.
> >>
> >> This VM is CPU limited by Libvirt by putting that KVM process in it's
> >> own cgroup with a CPU limitation.
> >>
> >> Now, my theory is:
> >>
> >> KVM (qemu-kvm) is completely userspace and librbd runs inside qemu-kvm
> >> as a library. All threads for disk I/O are part of the same PID and thus
> >> part of that cgroup.
> >>
> >> If a process inside the Virtual Machine now starts to consume all CPU
> >> time there is nothing left for librbd which slows it down.
> >>
> >> This then causes a increased I/O-wait inside the Virtual Machine. Even
> >> though the VM is not performing a lot of disk I/O. The wait of the I/O
> >> goes up due to this.
> >>
> >>
> >> Is my theory sane?
> >
> > Yes, I would say that your theory is sane. Have you looked into
> > libvirt's cgroup controls for limiting the emulator portion vs the
> > vCPUs [1]? I'd hope the librbd code and threads should be running in
> > the emulator cgroup (in a perfect world).
> >
>
> I checked with 'virsh schedinfo X' and this is the output I got:
>
> Scheduler      : posix
> cpu_shares     : 1000
> vcpu_period    : 100000
> vcpu_quota     : -1
> emulator_period: 100000
> emulator_quota : -1
> global_period  : 100000
> global_quota   : -1
> iothread_period: 100000
> iothread_quota : -1
>
>
> How can we confirm if the librbd code runs inside the Emulator part?

You can look under the "/proc/<QEMU PID>/tasks/<THREAD>/ directories.
The "comm" file has the thread friendly name. If it's a librbd /
librados thread you will see things like the following (taken from an
'rbd bench-write' process):

$ cat */comm
rbd
log
service
admin_socket
msgr-worker-0
msgr-worker-1
msgr-worker-2
rbd
ms_dispatch
ms_local
safe_timer
fn_anonymous
safe_timer
safe_timer
fn-radosclient
tp_librbd
safe_timer
safe_timer
taskfin_librbd
signal_handler

Those directories also have "cgroup" files which will indicate which
cgroup the thread is currently living under. For example, the
"tp_librbd" thread is running under the following cgroups in my
environment:

11:blkio:/
10:hugetlb:/
9:freezer:/
8:net_cls,net_prio:/
7:memory:/user.slice/user-1000.slice/user@1000.service
6:cpu,cpuacct:/
5:devices:/user.slice
4:perf_event:/
3:cpuset:/
2:pids:/user.slice/user-1000.slice/user@1000.service
1:name=systemd:/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service
0::/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service

> Wido
>
> >> Can somebody confirm this?
> >>
> >> Thanks,
> >>
> >> Wido
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > [1] https://libvirt.org/cgroups.html
> >

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com