On 8/22/19 5:49 PM, Jason Dillaman wrote: > On Thu, Aug 22, 2019 at 11:29 AM Wido den Hollander <wido@xxxxxxxx> wrote: >> >> >> >> On 8/22/19 3:59 PM, Jason Dillaman wrote: >>> On Thu, Aug 22, 2019 at 9:23 AM Wido den Hollander <wido@xxxxxxxx> wrote: >>>> >>>> Hi, >>>> >>>> In a couple of situations I have encountered that Virtual Machines >>>> running on RBD had a high I/O-wait, nearly 100%, on their vdX (VirtIO) >>>> or sdX (Virtio-SCSI) devices while they were performing CPU intensive tasks. >>>> >>>> These servers would be running a very CPU intensive application while >>>> *not* doing that many disk I/O. >>>> >>>> I however noticed that the I/O-wait of the disk(s) in the VM went up to >>>> 100%. >>>> >>>> This VM is CPU limited by Libvirt by putting that KVM process in it's >>>> own cgroup with a CPU limitation. >>>> >>>> Now, my theory is: >>>> >>>> KVM (qemu-kvm) is completely userspace and librbd runs inside qemu-kvm >>>> as a library. All threads for disk I/O are part of the same PID and thus >>>> part of that cgroup. >>>> >>>> If a process inside the Virtual Machine now starts to consume all CPU >>>> time there is nothing left for librbd which slows it down. >>>> >>>> This then causes a increased I/O-wait inside the Virtual Machine. Even >>>> though the VM is not performing a lot of disk I/O. The wait of the I/O >>>> goes up due to this. >>>> >>>> >>>> Is my theory sane? >>> >>> Yes, I would say that your theory is sane. Have you looked into >>> libvirt's cgroup controls for limiting the emulator portion vs the >>> vCPUs [1]? I'd hope the librbd code and threads should be running in >>> the emulator cgroup (in a perfect world). >>> >> >> I checked with 'virsh schedinfo X' and this is the output I got: >> >> Scheduler : posix >> cpu_shares : 1000 >> vcpu_period : 100000 >> vcpu_quota : -1 >> emulator_period: 100000 >> emulator_quota : -1 >> global_period : 100000 >> global_quota : -1 >> iothread_period: 100000 >> iothread_quota : -1 >> >> >> How can we confirm if the librbd code runs inside the Emulator part? > > You can look under the "/proc/<QEMU PID>/tasks/<THREAD>/ directories. > The "comm" file has the thread friendly name. If it's a librbd / > librados thread you will see things like the following (taken from an > 'rbd bench-write' process): > > $ cat */comm > rbd > log > service > admin_socket > msgr-worker-0 > msgr-worker-1 > msgr-worker-2 > rbd > ms_dispatch > ms_local > safe_timer > fn_anonymous > safe_timer > safe_timer > fn-radosclient > tp_librbd > safe_timer > safe_timer > taskfin_librbd > signal_handler > > Those directories also have "cgroup" files which will indicate which > cgroup the thread is currently living under. For example, the > "tp_librbd" thread is running under the following cgroups in my > environment: > > 11:blkio:/ > 10:hugetlb:/ > 9:freezer:/ > 8:net_cls,net_prio:/ > 7:memory:/user.slice/user-1000.slice/user@1000.service > 6:cpu,cpuacct:/ > 5:devices:/user.slice > 4:perf_event:/ > 3:cpuset:/ > 2:pids:/user.slice/user-1000.slice/user@1000.service > 1:name=systemd:/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service > 0::/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service > I checked: root@n01:/proc/3668710/task# cat 3668748/comm tp_librbd root@n01:/proc/3668710/task# So that seems to be rbd right? I also checked the 'fn-radosclient' thread. root@n01:/proc/3668710/task# cat 3668748/cgroup 12:hugetlb:/ 11:memory:/machine/i-1551-77-VM.libvirt-qemu 10:freezer:/machine/i-1551-77-VM.libvirt-qemu 9:pids:/system.slice/libvirt-bin.service 8:rdma:/ 7:cpu,cpuacct:/machine/i-1551-77-VM.libvirt-qemu/emulator 6:blkio:/machine/i-1551-77-VM.libvirt-qemu 5:cpuset:/machine/i-1551-77-VM.libvirt-qemu/emulator 4:devices:/machine/i-1551-77-VM.libvirt-qemu 3:perf_event:/machine/i-1551-77-VM.libvirt-qemu 2:net_cls,net_prio:/machine/i-1551-77-VM.libvirt-qemu 1:name=systemd:/system.slice/libvirt-bin.service root@n01:/proc/3668710/task# It seems that this RBD thread is in the 'emulator', isn't it? Is this what we want? Wido > >> Wido >> >>>> Can somebody confirm this? >>>> >>>> Thanks, >>>> >>>> Wido >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> [1] https://libvirt.org/cgroups.html >>> > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com