Hi there,
We have a production Ceph cluster with 12 OSDs spread over 6 hosts
running version 0.72.2.
From time to time, we're seeing some nasty multi-second latencies
(typically 1-3 second, sometimes as high as 5 seconds) inside QEMU VMs
for both read and write loads.
The VMs are still responsive - we installed the relevant QEMU patches a
long time back for async rbd I/O. At that time we were seeing
multi-second VM stalls.
I think all we've managed to do, however, is mask the real underlying
problem. Now the VM OS doesn't stall, but a database I/O might sit and
wait too long.
For a while it seemed there was a pattern to these spikes in latency,
once every 30 minutes or so. We figured it might have something to do
with scrubbing and changed the default OSD settings a bit:
[osd]
osd op threads = 8
osd op thread timeout = 60
osd target transaction size = 50
osd max backfills = 1
osd recovery max active = 1
osd journal size = 10000
osd max scrubs = 1
osd scrub load threshold = 0.3
osd scrub min interval = 86400
osd scrub max interval = 604800
osd scrub stride = 65536
From what we can tell, it happens all the time now and we're not sure
if it's related to cluster load.
We'd obviously like to get a better handle on the problem and would
appreciate suggestions on how we can better measure the phenomenon? We
get the following rados bench results (1 Gbps network, SATA disks, OSDs
on XFS, on the production cluster):
Total time run: 301.425932
Total writes made: 7064
Write size: 4194304
Bandwidth (MB/sec): 93.741
Stddev Bandwidth: 23.1038
Max bandwidth (MB/sec): 136
Min bandwidth (MB/sec): 0
Average Latency: 0.682643
Stddev Latency: 0.497225
Max latency: 3.75956
Min latency: 0.095493
Are those latencies in seconds? What is typical? What should we expect?
I would imagine if we're seeing multi-second latencies, then
something is wrong? General throughput doesn't seem too bad, but as I
said, we're worried about the latency here.
We've also tried turning off hardware disk caches (thinking queueing
delays for commits requiring barriers might be a problem) and
experimented with various I/O schedulers in both the host and VM OSes.
So far, we've seen the best results with deadline on the hosts and noop
in the VM. It doesn't seem like disabling the on-disk caches made any
difference.
Any ideas?
Regards,
Edwin Peer
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com