Re: Performance problems

Olivier Bonvalet <ceph.list@xxxxxxxxx> · Fri, 12 Apr 2013 19:45:36 +0200

Le vendredi 12 avril 2013 à 10:04 -0500, Mark Nelson a écrit :
> On 04/11/2013 07:25 PM, Ziemowit Pierzycki wrote:
> > No, I'm not using RDMA in this configuration since this will eventually
> > get deployed to production with 10G ethernet (yes RDMA is faster).  I
> > would prefer Ceph because it has a storage drive built into OpenNebula
> > which my company is using and as you mentioned individual drives.
> >
> > I'm not sure what the problem is but it appears to me that one of the
> > hosts may be holding up the rest... with Ceph if the performance of one
> > of the hosts is much faster than others could this potentially slow down
> > the cluster to this level?
> 
> Definitely!  Even 1 slow OSD can cause dramatic slow downs.  This is 
> because we (by default) try to distribute data evenly to every OSD in 
> the cluster.  If even 1 OSD is really slow, it will accumulate more and 
> more outstanding operations while all of the other OSDs complete their 
> requests.  What will happen is that eventually you will have all of your 
> outstanding operations waiting on that slow OSD, and all of the other 
> OSDs will sit idle waiting for new requests.
> 
> If you know that some OSDs are permanently slower than others, you can 
> re-weight them so that they receive fewer requests than the others which 
> can mitigate this, but that isn't always an optimal solution.  Some 
> times a slow OSD can be a sign of other hardware problems too.
> 
> Mark
> 

and does response time of OSD are log somewhere, to identify that "weak
link" ?

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com