Re: Theory: High I/O-wait inside VM with RBD due to CPU throttling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 8/22/19 5:49 PM, Jason Dillaman wrote:
> On Thu, Aug 22, 2019 at 11:29 AM Wido den Hollander <wido@xxxxxxxx> wrote:
>>
>>
>>
>> On 8/22/19 3:59 PM, Jason Dillaman wrote:
>>> On Thu, Aug 22, 2019 at 9:23 AM Wido den Hollander <wido@xxxxxxxx> wrote:
>>>>
>>>> Hi,
>>>>
>>>> In a couple of situations I have encountered that Virtual Machines
>>>> running on RBD had a high I/O-wait, nearly 100%, on their vdX (VirtIO)
>>>> or sdX (Virtio-SCSI) devices while they were performing CPU intensive tasks.
>>>>
>>>> These servers would be running a very CPU intensive application while
>>>> *not* doing that many disk I/O.
>>>>
>>>> I however noticed that the I/O-wait of the disk(s) in the VM went up to
>>>> 100%.
>>>>
>>>> This VM is CPU limited by Libvirt by putting that KVM process in it's
>>>> own cgroup with a CPU limitation.
>>>>
>>>> Now, my theory is:
>>>>
>>>> KVM (qemu-kvm) is completely userspace and librbd runs inside qemu-kvm
>>>> as a library. All threads for disk I/O are part of the same PID and thus
>>>> part of that cgroup.
>>>>
>>>> If a process inside the Virtual Machine now starts to consume all CPU
>>>> time there is nothing left for librbd which slows it down.
>>>>
>>>> This then causes a increased I/O-wait inside the Virtual Machine. Even
>>>> though the VM is not performing a lot of disk I/O. The wait of the I/O
>>>> goes up due to this.
>>>>
>>>>
>>>> Is my theory sane?
>>>
>>> Yes, I would say that your theory is sane. Have you looked into
>>> libvirt's cgroup controls for limiting the emulator portion vs the
>>> vCPUs [1]? I'd hope the librbd code and threads should be running in
>>> the emulator cgroup (in a perfect world).
>>>
>>
>> I checked with 'virsh schedinfo X' and this is the output I got:
>>
>> Scheduler      : posix
>> cpu_shares     : 1000
>> vcpu_period    : 100000
>> vcpu_quota     : -1
>> emulator_period: 100000
>> emulator_quota : -1
>> global_period  : 100000
>> global_quota   : -1
>> iothread_period: 100000
>> iothread_quota : -1
>>
>>
>> How can we confirm if the librbd code runs inside the Emulator part?
> 
> You can look under the "/proc/<QEMU PID>/tasks/<THREAD>/ directories.
> The "comm" file has the thread friendly name. If it's a librbd /
> librados thread you will see things like the following (taken from an
> 'rbd bench-write' process):
> 
> $ cat */comm
> rbd
> log
> service
> admin_socket
> msgr-worker-0
> msgr-worker-1
> msgr-worker-2
> rbd
> ms_dispatch
> ms_local
> safe_timer
> fn_anonymous
> safe_timer
> safe_timer
> fn-radosclient
> tp_librbd
> safe_timer
> safe_timer
> taskfin_librbd
> signal_handler
> 
> Those directories also have "cgroup" files which will indicate which
> cgroup the thread is currently living under. For example, the
> "tp_librbd" thread is running under the following cgroups in my
> environment:
> 
> 11:blkio:/
> 10:hugetlb:/
> 9:freezer:/
> 8:net_cls,net_prio:/
> 7:memory:/user.slice/user-1000.slice/user@1000.service
> 6:cpu,cpuacct:/
> 5:devices:/user.slice
> 4:perf_event:/
> 3:cpuset:/
> 2:pids:/user.slice/user-1000.slice/user@1000.service
> 1:name=systemd:/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service
> 0::/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service
> 

I checked:

root@n01:/proc/3668710/task# cat 3668748/comm
tp_librbd
root@n01:/proc/3668710/task#

So that seems to be rbd right? I also checked the 'fn-radosclient' thread.

root@n01:/proc/3668710/task# cat 3668748/cgroup
12:hugetlb:/
11:memory:/machine/i-1551-77-VM.libvirt-qemu
10:freezer:/machine/i-1551-77-VM.libvirt-qemu
9:pids:/system.slice/libvirt-bin.service
8:rdma:/
7:cpu,cpuacct:/machine/i-1551-77-VM.libvirt-qemu/emulator
6:blkio:/machine/i-1551-77-VM.libvirt-qemu
5:cpuset:/machine/i-1551-77-VM.libvirt-qemu/emulator
4:devices:/machine/i-1551-77-VM.libvirt-qemu
3:perf_event:/machine/i-1551-77-VM.libvirt-qemu
2:net_cls,net_prio:/machine/i-1551-77-VM.libvirt-qemu
1:name=systemd:/system.slice/libvirt-bin.service
root@n01:/proc/3668710/task#

It seems that this RBD thread is in the 'emulator', isn't it?

Is this what we want?

Wido

> 
>> Wido
>>
>>>> Can somebody confirm this?
>>>>
>>>> Thanks,
>>>>
>>>> Wido
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> [1] https://libvirt.org/cgroups.html
>>>
> 
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux