Hello, we have an openstack installation running on linux: host kernel: 3.13.0-35-generic, qemu: QEMU emulator version 2.2.0 (Debian 1:2.2+dfsg-5expubuntu9.7~cloud5) the host OS is using its own physical disk, while the guests are using ceph for all their storage needs. We have observed situations when the guests seems to be hanging/blocked for a few seconds, and we cannot figure out why. I have created a simple script to record some parameters at every seconds and writes it to a file at every 10 minutes: COUNT=600 date >> $LOGFILE cut -f 1 -d\ /proc/uptime >> $LOGFILE while : do for ((i=0; i<$COUNT; i++)) do read UPT[$i] rest < /proc/uptime read cpu CPUAVG[$i] < /proc/stat /bin/sleep 1 done for ((i=0; i<$COUNT; i++)) do echo ${UPT[$i]}" "${CPUAVG[$i]} >> $LOGFILE done done with this script I had a situation when there are more than 2 seconds between 2 consecutive records. This can be seen in the following series of data (the first column is the uptime, the rest are the counters from /proc/stat): 235503.52 21198 283 30839 23487544 3556 0 0 461168 0 0 235504.52 21198 283 30839 23487644 3556 0 0 461168 0 0 235505.53 21198 283 30839 23487744 3556 0 0 461168 0 0 235508.60 21199 283 30839 23487844 3556 0 0 461168 0 0 <--- 3 sec 235509.60 21200 283 30839 23487944 3556 0 0 461168 0 0 235510.61 21200 283 30839 23488044 3556 0 0 461168 0 0 235511.61 21200 283 30840 23488144 3556 0 0 461168 0 0 235513.98 21200 283 30840 23488244 3556 0 0 461168 0 0 235519.85 21201 283 30840 23488342 3558 0 0 461168 0 0 <--- 6 sec 235522.44 21201 283 30841 23488442 3558 0 0 461168 0 0 <--- 3 sec 235525.06 21202 283 30841 23488542 3558 0 0 461168 0 0 <--- 3 sec 235574.25 21212 283 30852 23493333 3561 0 0 461168 0 0 235577.44 21213 283 30852 23493433 3561 0 0 461168 0 0 <--- 3 sec 236616.81 21379 283 31108 23596808 3565 0 0 461168 0 0 236625.24 21380 283 31108 23596908 3565 0 0 461168 0 0 <--- 9 sec 236627.76 21380 283 31108 23597008 3565 0 0 461168 0 0 236630.02 21381 283 31108 23597066 3565 0 0 461168 0 0 <--- 3 sec 236632.54 21381 283 31108 23597101 3565 0 0 461168 0 0 236636.13 21382 283 31108 23597201 3565 0 0 461168 0 0 <--- 4 sec 237219.96 21476 283 31255 23655224 3573 0 0 461168 0 0 237222.56 21477 283 31255 23655324 3573 0 0 461168 0 0 <--- 3 sec 237225.55 21478 283 31255 23655424 3573 0 0 461168 0 0 <--- 3 sec All these seems to come in groups that rarely spans more than a minute. The first 5 issues are within 72 seconds (start at 235505.53, ending at 235577.44), the next 3 are within 20 seconds (236616.81 - 236636.13), and the last group of issues spans 6 seconds (237219.96 - 237225.55). if I'm checking the status of the host as recorded by sar for the same period I have: ~$ sar -f /var/log/sysstat/sa05 -s 02:00:00 -e 04:00:00 Linux 3.13.0-35-generic (dp-c2-bl03) 03/05/2017 _x86_64_ (40 CPU) 02:05:01 AM CPU %user %nice %system %iowait %steal %idle 02:15:01 AM all 0.99 0.00 0.38 0.00 0.00 98.64 02:25:01 AM all 1.00 0.00 0.38 0.00 0.00 98.62 02:35:01 AM all 1.21 0.00 0.51 0.00 0.00 98.28 02:45:01 AM all 1.01 0.00 0.38 0.00 0.00 98.61 02:55:01 AM all 1.17 0.00 0.50 1.00 0.00 97.33 <--- iowait >0 (this is when the first group has happened) 03:05:01 AM all 1.02 0.00 0.37 0.00 0.00 98.61 03:15:01 AM all 1.22 0.00 0.55 0.81 0.00 97.42 <--- iowait >0 (2nd group) 03:25:01 AM all 1.04 0.00 0.45 0.38 0.00 98.13 <--- iowait >0 (3rd group) 03:35:01 AM all 1.04 0.00 0.41 0.00 0.00 98.55 03:45:01 AM all 1.10 0.00 0.43 0.00 0.00 98.46 03:55:01 AM all 1.00 0.00 0.38 0.00 0.00 98.62 Average: all 1.07 0.00 0.43 0.20 0.00 98.30 all the other parameters recorded by sar seems to be normal. No load on the disk is seen: ~$ sar -f /var/log/sysstat/sa05 -s 02:00:00 -e 04:00:00 -d Linux 3.13.0-35-generic (dp-c2-bl03) 03/05/2017 _x86_64_ (40 CPU) 02:05:01 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 02:15:01 AM dev8-0 0.76 0.00 26.01 34.30 0.00 0.01 0.01 0.00 02:25:01 AM dev8-0 0.81 0.00 26.42 32.63 0.00 0.00 0.00 0.00 02:35:01 AM dev8-0 0.76 0.00 25.93 34.20 0.00 0.00 0.00 0.00 02:45:01 AM dev8-0 0.76 0.00 25.97 34.33 0.00 0.00 0.00 0.00 02:55:01 AM dev8-0 0.75 0.00 25.91 34.47 0.00 0.01 0.01 0.00 03:05:01 AM dev8-0 0.80 0.00 26.98 33.54 0.00 0.01 0.01 0.00 03:15:01 AM dev8-0 0.74 0.00 24.92 33.68 0.00 0.02 0.02 0.00 03:25:01 AM dev8-0 0.74 0.00 25.99 35.20 0.00 0.00 0.00 0.00 03:25:01 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 03:35:01 AM dev8-0 0.77 0.00 26.57 34.29 0.00 0.02 0.02 0.00 03:45:01 AM dev8-0 0.79 0.00 25.89 32.64 0.00 0.01 0.01 0.00 03:55:01 AM dev8-0 0.77 0.00 26.03 33.66 0.00 0.00 0.00 0.00 Average: dev8-0 0.77 0.00 26.06 33.88 0.00 0.01 0.01 0.00 ~$ sar -f /var/log/sysstat/sa05 -s 02:00:00 -e 04:00:00 -S Linux 3.13.0-35-generic (dp-c2-bl03) 03/05/2017 _x86_64_ (40 CPU) 02:05:01 AM kbswpfree kbswpused %swpused kbswpcad %swpcad 02:15:01 AM 8388604 0 0.00 0 0.00 02:25:01 AM 8388604 0 0.00 0 0.00 02:35:01 AM 8388604 0 0.00 0 0.00 02:45:01 AM 8388604 0 0.00 0 0.00 02:55:01 AM 8388604 0 0.00 0 0.00 03:05:01 AM 8388604 0 0.00 0 0.00 03:15:01 AM 8388604 0 0.00 0 0.00 03:25:01 AM 8388604 0 0.00 0 0.00 03:35:01 AM 8388604 0 0.00 0 0.00 03:45:01 AM 8388604 0 0.00 0 0.00 03:55:01 AM 8388604 0 0.00 0 0.00 Average: 8388604 0 0.00 0 0.00 Did any of you has seen something similar? what can cause the rise of the iowait? (This is the only metric that seems to be strange in the given periods.) What should I do to track down the reason for these hangs? What can be done to avoid them? Any ideas are welcome. Thank you, Laszlo