Improving responsiveness of KVM guests on Ceph storage

Jens Kristian Søgaard <jens@xxxxxxxxxxxxxxxxxxxx> · Sun, 30 Dec 2012 18:05:33 +0100

Hi guys,

I'm testing Ceph as storage for KVM virtual machine images and found an 
inconvenience that I am hoping it is possible to find the cause of.

I'm running a single KVM Linux guest on top of Ceph storage. In that 
guest I run rsync to download files from the internet. When rsync is 
running, the guest will seemingly stall and run very slowly.

For example if I log in via SSH to the guest and use the command prompt, 
nothing will happen for a long period (30+ seconds), then it processes a 
few typed characters, and then it blocks for another long period of 
time, then process a bit more, etc.

I was hoping to be able to tweak the system so that it runs more like 
when using conventional storage - i.e. perhaps the rsync won't be super 
fast, but the machine will be equally responsive all the time.

I'm hoping that you can provide some hints on how to best benchmark or 
test the system to find the cause of this?

The ceph OSDs periodically logs thse two messages, that I do not fully 
understand:

12-12-30 17:07:12.894920 7fc8f3242700  1 heartbeat_map is_healthy 
'OSD::op_tp thread 0x7fc8cbfff700' had timed out after 30
2012-12-30 17:07:13.599126 7fc8cbfff700  1 heartbeat_map reset_timeout 
'OSD::op_tp thread 0x7fc8cbfff700' had timed out after 30

Is this to be expected when the system is in use, or does it indicate 
that something is wrong?

Ceph also logs messages such as this:

2012-12-30 17:07:36.932272 osd.0 10.0.0.1:6800/9157 286340 : [WRN] slow 
request 30.751940 seconds old, received at 2012-12-30 17:07:06.180236: 
osd_op(client.4705.0:16074961 rb.0.11b7.4a933baa.0000000c188f [write 
532480~4096] 0.f2a63fe) v4 currently waiting for sub ops

My setup:

3 servers running Fedora 17 with Ceph 0.55.1 from RPM.
Each server runs one osd and one mon. One of the servers also runs an mds.
Backing file system is btrfs stored on a md-raid . Journal is stored on 
the same SATA disks as the rests of the data.
Each server has 3 bonded gigabit/sec NICs.

One server running Fedora 16 with qemu-kvm.
Has gigabit/sec NIC connected to the same network as the Ceph servers, 
and a gigabit/sec NIC connected to the Internet.
Disk is mounted with:

-drive format=rbd,file=rbd:data/image1:rbd_cache=1,if=virtio

iostat on the KVM guest gives:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,00    0,00    0,00  100,00    0,00    0,00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s 
avgrq-sz avgqu-sz   await  svctm  %util
vda               0,00     1,40    0,10    0,30     0,80    13,60 
36,00     1,66 2679,25 2499,75  99,99

Top on the KVM host shows 90% CPU idle and 0.0% I/O waiting.

iostat on a OSD gives:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,13    0,00    1,50   15,79    0,00   82,58

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda             240,70   441,20   33,00   42,70  1122,40  1961,80 
81,48    14,45  164,42  319,14   44,85   6,63  50,22
sdb             299,10   393,10   33,90   38,40  1363,60  1720,60 
85,32    13,55  171,32  316,21   43,41   6,55  47,39
sdc             268,50   441,60   28,80   45,40  1191,60  1977,00 
85,41    19,08  159,39  345,98   41,02   6,56  48,69
sdd             255,50   445,50   30,20   45,00  1150,40  1975,80 
83,14    18,18  155,97  338,90   33,20   6,95  52,23
md0               0,00     0,00    1,20  132,70     4,80  4086,40 
61,11     0,00    0,00    0,00    0,00   0,00   0,00

The figures are similar on all three OSDs.

I am thinking that one possible cause could be that the journal is 
stored on the same disks as the rest of the data, but I don't know how 
to benchmark if this is actually the case (?)

Thanks for any help or advice, you can offer!

--
Jens Kristian Søgaard, Mermaid Consulting ApS,
jens@xxxxxxxxxxxxxxxxxxxx,
http://www.mermaidconsulting.com/
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html