Hello, I have a strange situation: On a host server we are running 5 VMs. The VMs have their disks provisioned by cinder from a ceph cluster and are attached by quemu-kvm using librbd. We have a very strange situation when the VMs apparently have stopped to work for a few seconds (10-20), and after that were continuing their operations. I only have access to the host system. Checking the values reported by sar I can see the following: A slight iowait appearing on the host (the problem has appeared between 12:29:35-12:29:55): 12:05:01 PM CPU %user %nice %system %iowait %steal %idle 12:15:01 PM all 3.16 0.00 0.55 0.00 0.00 96.29 12:25:01 PM all 3.34 0.00 0.73 0.00 0.00 95.93 12:35:01 PM all 3.65 0.00 0.94 1.44 0.00 93.97 <----- iowait is 1.44 the only value different than 0 for the whole day 12:45:01 PM all 3.27 0.00 0.65 0.00 0.00 96.08 12:55:01 PM all 3.18 0.00 0.58 0.00 0.00 96.24 The only disk based fs is the / : $ df -h Filesystem Size Used Avail Use% Mounted on udev 63G 12K 63G 1% /dev tmpfs 13G 1.4M 13G 1% /run /dev/sda1 275G 13G 249G 5% / none 4.0K 0 4.0K 0% /sys/fs/cgroup none 5.0M 0 5.0M 0% /run/lock none 63G 12K 63G 1% /run/shm none 100M 0 100M 0% /run/user while the sar values for the disk does not show anything unusual: 12:05:01 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 12:15:01 PM dev8-0 0.79 0.00 25.20 31.76 0.00 0.00 0.00 0.00 12:25:01 PM dev8-0 0.80 0.00 25.52 31.90 0.00 0.00 0.00 0.00 12:35:01 PM dev8-0 0.76 0.01 25.15 33.18 0.00 0.01 0.01 0.00 12:45:01 PM dev8-0 0.79 0.00 25.01 31.46 0.00 0.01 0.01 0.00 12:55:01 PM dev8-0 0.80 0.00 25.84 32.44 0.00 0.00 0.00 0.00 Average: dev8-0 0.79 0.00 25.34 32.14 0.00 0.00 0.00 0.00 The VMs have their discs on a ceph cluster and are accessing them using librbd. I can see some traffic peak on the storage interface: 12:05:01 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil 12:15:01 PM vlan6 157.49 148.81 42.45 328.90 0.00 0.00 0.00 0.00 12:25:01 PM vlan6 154.82 148.44 41.97 327.32 0.00 0.00 0.00 0.00 12:35:01 PM vlan6 157.22 154.34 47.12 505.42 0.00 0.00 0.00 0.00 <----- txkB goes up to 505 from an average of 328 12:45:01 PM vlan6 152.60 147.00 41.15 319.85 0.00 0.00 0.00 0.00 12:55:01 PM vlan6 156.09 147.38 42.22 323.50 0.00 0.00 0.00 0.00 Average: vlan6 155.64 149.19 42.98 361.00 0.00 0.00 0.00 0.00 My question is: is it possible that the librbd access to the ceph cluster has caused iowait value observed on the host? Thank you, Laszlo _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com