On Thu, Aug 22, 2019 at 11:29 AM Wido den Hollander <wido@xxxxxxxx> wrote: > > > > On 8/22/19 3:59 PM, Jason Dillaman wrote: > > On Thu, Aug 22, 2019 at 9:23 AM Wido den Hollander <wido@xxxxxxxx> wrote: > >> > >> Hi, > >> > >> In a couple of situations I have encountered that Virtual Machines > >> running on RBD had a high I/O-wait, nearly 100%, on their vdX (VirtIO) > >> or sdX (Virtio-SCSI) devices while they were performing CPU intensive tasks. > >> > >> These servers would be running a very CPU intensive application while > >> *not* doing that many disk I/O. > >> > >> I however noticed that the I/O-wait of the disk(s) in the VM went up to > >> 100%. > >> > >> This VM is CPU limited by Libvirt by putting that KVM process in it's > >> own cgroup with a CPU limitation. > >> > >> Now, my theory is: > >> > >> KVM (qemu-kvm) is completely userspace and librbd runs inside qemu-kvm > >> as a library. All threads for disk I/O are part of the same PID and thus > >> part of that cgroup. > >> > >> If a process inside the Virtual Machine now starts to consume all CPU > >> time there is nothing left for librbd which slows it down. > >> > >> This then causes a increased I/O-wait inside the Virtual Machine. Even > >> though the VM is not performing a lot of disk I/O. The wait of the I/O > >> goes up due to this. > >> > >> > >> Is my theory sane? > > > > Yes, I would say that your theory is sane. Have you looked into > > libvirt's cgroup controls for limiting the emulator portion vs the > > vCPUs [1]? I'd hope the librbd code and threads should be running in > > the emulator cgroup (in a perfect world). > > > > I checked with 'virsh schedinfo X' and this is the output I got: > > Scheduler : posix > cpu_shares : 1000 > vcpu_period : 100000 > vcpu_quota : -1 > emulator_period: 100000 > emulator_quota : -1 > global_period : 100000 > global_quota : -1 > iothread_period: 100000 > iothread_quota : -1 > > > How can we confirm if the librbd code runs inside the Emulator part? You can look under the "/proc/<QEMU PID>/tasks/<THREAD>/ directories. The "comm" file has the thread friendly name. If it's a librbd / librados thread you will see things like the following (taken from an 'rbd bench-write' process): $ cat */comm rbd log service admin_socket msgr-worker-0 msgr-worker-1 msgr-worker-2 rbd ms_dispatch ms_local safe_timer fn_anonymous safe_timer safe_timer fn-radosclient tp_librbd safe_timer safe_timer taskfin_librbd signal_handler Those directories also have "cgroup" files which will indicate which cgroup the thread is currently living under. For example, the "tp_librbd" thread is running under the following cgroups in my environment: 11:blkio:/ 10:hugetlb:/ 9:freezer:/ 8:net_cls,net_prio:/ 7:memory:/user.slice/user-1000.slice/user@1000.service 6:cpu,cpuacct:/ 5:devices:/user.slice 4:perf_event:/ 3:cpuset:/ 2:pids:/user.slice/user-1000.slice/user@1000.service 1:name=systemd:/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service 0::/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service > Wido > > >> Can somebody confirm this? > >> > >> Thanks, > >> > >> Wido > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > [1] https://libvirt.org/cgroups.html > > -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com