On Mon, Aug 26, 2019 at 5:01 AM Wido den Hollander <wido@xxxxxxxx> wrote: > > > > On 8/22/19 5:49 PM, Jason Dillaman wrote: > > On Thu, Aug 22, 2019 at 11:29 AM Wido den Hollander <wido@xxxxxxxx> wrote: > >> > >> > >> > >> On 8/22/19 3:59 PM, Jason Dillaman wrote: > >>> On Thu, Aug 22, 2019 at 9:23 AM Wido den Hollander <wido@xxxxxxxx> wrote: > >>>> > >>>> Hi, > >>>> > >>>> In a couple of situations I have encountered that Virtual Machines > >>>> running on RBD had a high I/O-wait, nearly 100%, on their vdX (VirtIO) > >>>> or sdX (Virtio-SCSI) devices while they were performing CPU intensive tasks. > >>>> > >>>> These servers would be running a very CPU intensive application while > >>>> *not* doing that many disk I/O. > >>>> > >>>> I however noticed that the I/O-wait of the disk(s) in the VM went up to > >>>> 100%. > >>>> > >>>> This VM is CPU limited by Libvirt by putting that KVM process in it's > >>>> own cgroup with a CPU limitation. > >>>> > >>>> Now, my theory is: > >>>> > >>>> KVM (qemu-kvm) is completely userspace and librbd runs inside qemu-kvm > >>>> as a library. All threads for disk I/O are part of the same PID and thus > >>>> part of that cgroup. > >>>> > >>>> If a process inside the Virtual Machine now starts to consume all CPU > >>>> time there is nothing left for librbd which slows it down. > >>>> > >>>> This then causes a increased I/O-wait inside the Virtual Machine. Even > >>>> though the VM is not performing a lot of disk I/O. The wait of the I/O > >>>> goes up due to this. > >>>> > >>>> > >>>> Is my theory sane? > >>> > >>> Yes, I would say that your theory is sane. Have you looked into > >>> libvirt's cgroup controls for limiting the emulator portion vs the > >>> vCPUs [1]? I'd hope the librbd code and threads should be running in > >>> the emulator cgroup (in a perfect world). > >>> > >> > >> I checked with 'virsh schedinfo X' and this is the output I got: > >> > >> Scheduler : posix > >> cpu_shares : 1000 > >> vcpu_period : 100000 > >> vcpu_quota : -1 > >> emulator_period: 100000 > >> emulator_quota : -1 > >> global_period : 100000 > >> global_quota : -1 > >> iothread_period: 100000 > >> iothread_quota : -1 > >> > >> > >> How can we confirm if the librbd code runs inside the Emulator part? > > > > You can look under the "/proc/<QEMU PID>/tasks/<THREAD>/ directories. > > The "comm" file has the thread friendly name. If it's a librbd / > > librados thread you will see things like the following (taken from an > > 'rbd bench-write' process): > > > > $ cat */comm > > rbd > > log > > service > > admin_socket > > msgr-worker-0 > > msgr-worker-1 > > msgr-worker-2 > > rbd > > ms_dispatch > > ms_local > > safe_timer > > fn_anonymous > > safe_timer > > safe_timer > > fn-radosclient > > tp_librbd > > safe_timer > > safe_timer > > taskfin_librbd > > signal_handler > > > > Those directories also have "cgroup" files which will indicate which > > cgroup the thread is currently living under. For example, the > > "tp_librbd" thread is running under the following cgroups in my > > environment: > > > > 11:blkio:/ > > 10:hugetlb:/ > > 9:freezer:/ > > 8:net_cls,net_prio:/ > > 7:memory:/user.slice/user-1000.slice/user@1000.service > > 6:cpu,cpuacct:/ > > 5:devices:/user.slice > > 4:perf_event:/ > > 3:cpuset:/ > > 2:pids:/user.slice/user-1000.slice/user@1000.service > > 1:name=systemd:/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service > > 0::/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service > > > > I checked: > > root@n01:/proc/3668710/task# cat 3668748/comm > tp_librbd > root@n01:/proc/3668710/task# > > So that seems to be rbd right? I also checked the 'fn-radosclient' thread. > > root@n01:/proc/3668710/task# cat 3668748/cgroup > 12:hugetlb:/ > 11:memory:/machine/i-1551-77-VM.libvirt-qemu > 10:freezer:/machine/i-1551-77-VM.libvirt-qemu > 9:pids:/system.slice/libvirt-bin.service > 8:rdma:/ > 7:cpu,cpuacct:/machine/i-1551-77-VM.libvirt-qemu/emulator > 6:blkio:/machine/i-1551-77-VM.libvirt-qemu > 5:cpuset:/machine/i-1551-77-VM.libvirt-qemu/emulator > 4:devices:/machine/i-1551-77-VM.libvirt-qemu > 3:perf_event:/machine/i-1551-77-VM.libvirt-qemu > 2:net_cls,net_prio:/machine/i-1551-77-VM.libvirt-qemu > 1:name=systemd:/system.slice/libvirt-bin.service > root@n01:/proc/3668710/task# > > It seems that this RBD thread is in the 'emulator', isn't it? > > Is this what we want? Yup, that looks good to me. I would then double-check your cgroups to see where the CPU restriction is being placed. If it's only at "/machine/i-1551-77-VM.libvirt-qemu", then the emulator and vcpu cgroups will be sharing time vs if each vcpu had its own restriction. > Wido > > > > >> Wido > >> > >>>> Can somebody confirm this? > >>>> > >>>> Thanks, > >>>> > >>>> Wido > >>>> _______________________________________________ > >>>> ceph-users mailing list > >>>> ceph-users@xxxxxxxxxxxxxx > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > >>> [1] https://libvirt.org/cgroups.html > >>> > > > > > > -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com