Le vendredi 12 avril 2013 à 19:45 +0200, Olivier Bonvalet a écrit : > Le vendredi 12 avril 2013 à 10:04 -0500, Mark Nelson a écrit : > > On 04/11/2013 07:25 PM, Ziemowit Pierzycki wrote: > > > No, I'm not using RDMA in this configuration since this will eventually > > > get deployed to production with 10G ethernet (yes RDMA is faster). I > > > would prefer Ceph because it has a storage drive built into OpenNebula > > > which my company is using and as you mentioned individual drives. > > > > > > I'm not sure what the problem is but it appears to me that one of the > > > hosts may be holding up the rest... with Ceph if the performance of one > > > of the hosts is much faster than others could this potentially slow down > > > the cluster to this level? > > > > Definitely! Even 1 slow OSD can cause dramatic slow downs. This is > > because we (by default) try to distribute data evenly to every OSD in > > the cluster. If even 1 OSD is really slow, it will accumulate more and > > more outstanding operations while all of the other OSDs complete their > > requests. What will happen is that eventually you will have all of your > > outstanding operations waiting on that slow OSD, and all of the other > > OSDs will sit idle waiting for new requests. > > > > If you know that some OSDs are permanently slower than others, you can > > re-weight them so that they receive fewer requests than the others which > > can mitigate this, but that isn't always an optimal solution. Some > > times a slow OSD can be a sign of other hardware problems too. > > > > Mark > > > > and does response time of OSD are log somewhere, to identify that "weak > link" ? > > I think I found the answer with the admin socket : ceph --admin-daemon /var/run/ceph/ceph-osd.14.asok perf dump For example in the output I can see op_latency, op_w_latency, op_r_latency, op_rw_latency, etc. Great. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com