Can librbd operations increase iowait?

Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> · Tue, 28 Feb 2017 22:15:03 +0200

Hello,

I have a strange situation:
On a host server we are running 5 VMs. The VMs have their disks provisioned by cinder from a ceph cluster and are attached by quemu-kvm using librbd.
We have a very strange situation when the VMs apparently have stopped to work for a few seconds (10-20), and after that were continuing their operations.
I only have access to the host system. Checking the values reported by sar I can see the following:

A slight iowait appearing on the host (the problem has appeared between 12:29:35-12:29:55):

12:05:01 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
12:15:01 PM     all      3.16      0.00      0.55      0.00      0.00     96.29
12:25:01 PM     all      3.34      0.00      0.73      0.00      0.00     95.93
12:35:01 PM     all      3.65      0.00      0.94      1.44      0.00     93.97 <----- iowait is 1.44 the only value different than 0 for the whole day
12:45:01 PM     all      3.27      0.00      0.65      0.00      0.00     96.08
12:55:01 PM     all      3.18      0.00      0.58      0.00      0.00     96.24

The only disk based fs is the / :
$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             63G   12K   63G   1% /dev
tmpfs            13G  1.4M   13G   1% /run
/dev/sda1       275G   13G  249G   5% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none            5.0M     0  5.0M   0% /run/lock
none             63G   12K   63G   1% /run/shm
none            100M     0  100M   0% /run/user

while the sar values for the disk does not show anything unusual:
12:05:01 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
12:15:01 PM    dev8-0      0.79      0.00     25.20     31.76      0.00      0.00      0.00      0.00
12:25:01 PM    dev8-0      0.80      0.00     25.52     31.90      0.00      0.00      0.00      0.00
12:35:01 PM    dev8-0      0.76      0.01     25.15     33.18      0.00      0.01      0.01      0.00
12:45:01 PM    dev8-0      0.79      0.00     25.01     31.46      0.00      0.01      0.01      0.00
12:55:01 PM    dev8-0      0.80      0.00     25.84     32.44      0.00      0.00      0.00      0.00
Average:       dev8-0      0.79      0.00     25.34     32.14      0.00      0.00      0.00      0.00

The VMs have their discs on a ceph cluster and are accessing them using librbd.
I can see some traffic peak on the storage interface:
12:05:01 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
12:15:01 PM     vlan6    157.49    148.81     42.45    328.90      0.00      0.00      0.00      0.00
12:25:01 PM     vlan6    154.82    148.44     41.97    327.32      0.00      0.00      0.00      0.00
12:35:01 PM     vlan6    157.22    154.34     47.12    505.42      0.00      0.00      0.00      0.00  <----- txkB goes up to 505 from an average of 328
12:45:01 PM     vlan6    152.60    147.00     41.15    319.85      0.00      0.00      0.00      0.00
12:55:01 PM     vlan6    156.09    147.38     42.22    323.50      0.00      0.00      0.00      0.00
Average:        vlan6    155.64    149.19     42.98    361.00      0.00      0.00      0.00      0.00

My question is: is it possible that the librbd access to the ceph cluster has caused iowait value observed on the host?

Thank you,
Laszlo
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com