latency when OSD falls out of cluster

Edwin Peer <edwin@xxxxxxxxxx> · Fri, 12 Jul 2013 08:03:34 +0200

Hi there,

We've been noticing nasty multi-second cluster wide latencies if an OSD 
drops out of an active cluster (due to power failure, or even being 
stopped cleanly). We've also seen this problem occur when an OSD is 
inserted back into the cluster.

Obviously, this has the effect of freezing all VMs doing I/O across the 
cluster for several seconds when a single node fails. Is this behaviour 
expected? Or have I perhaps got something configured wrong?

We're trying very hard to eliminate all single points of failure in our 
architecture, is there anything that can be done about this?

Regards,
Edwin Peer
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com