rbd latency

Edwin Peer <edwin@xxxxxxxxxx> · Wed, 15 Jan 2014 11:53:55 +0200

Hi there,

We have a production Ceph cluster with 12 OSDs spread over 6 hosts 
running version 0.72.2.

From time to time, we're seeing some nasty multi-second latencies 
(typically 1-3 second, sometimes as high as 5 seconds) inside QEMU VMs 
for both read and write loads.

The VMs are still responsive - we installed the relevant QEMU patches a 
long time back for async rbd I/O. At that time we were seeing 
multi-second VM stalls.

I think all we've managed to do, however, is mask the real underlying 
problem. Now the VM OS doesn't stall, but a database I/O might sit and 
wait too long.

For a while it seemed there was a pattern to these spikes in latency, 
once every 30 minutes or so. We figured it might have something to do 
with scrubbing and changed the default OSD settings a bit:

[osd]
        osd op threads = 8
        osd op thread timeout = 60
        osd target transaction size = 50
        osd max backfills = 1
        osd recovery max active = 1
        osd journal size = 10000
        osd max scrubs = 1
        osd scrub load threshold = 0.3
        osd scrub min interval = 86400
        osd scrub max interval = 604800
        osd scrub stride = 65536

From what we can tell, it happens all the time now and we're not sure 
if it's related to cluster load.

We'd obviously like to get a better handle on the problem and would 
appreciate suggestions on how we can better measure the phenomenon? We 
get the following rados bench results (1 Gbps network, SATA disks, OSDs 
on XFS, on the production cluster):

Total time run:         301.425932
Total writes made:      7064
Write size:             4194304
Bandwidth (MB/sec):     93.741

Stddev Bandwidth:       23.1038
Max bandwidth (MB/sec): 136
Min bandwidth (MB/sec): 0
Average Latency:        0.682643
Stddev Latency:         0.497225
Max latency:            3.75956
Min latency:            0.095493

Are those latencies in seconds? What is typical? What should we expect?

I would imagine if we're seeing multi-second latencies, then
something is wrong? General throughput doesn't seem too bad, but as I 
said, we're worried about the latency here.

We've also tried turning off hardware disk caches (thinking queueing 
delays for commits requiring barriers might be a problem) and 
experimented with various I/O schedulers in both the host and VM OSes. 
So far, we've seen the best results with deadline on the hosts and noop 
in the VM. It doesn't seem like disabling the on-disk caches made any 
difference.

Any ideas?

Regards,
Edwin Peer
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com