Re: Peering and disk utilization

John Nielsen <lists@xxxxxxxxxxxx> · Fri, 31 May 2013 12:36:06 -0600

Possibly related:
http://tracker.ceph.com/issues/5084

I'm seeing the same big delays with peering, and when I today marked an OSD "out" then "in" after a minute or two it was unexpectedly marked "down". I restarted it and 8 or so minutes later things were fine again. In the meantime our RBD KVM instances were blocking on I/O (writes especially), making them unresponsive.

On May 3, 2013, at 3:30 AM, Erdem Agaoglu <erdem.agaoglu@xxxxxxxxx> wrote:

> I'm not sure if the problems we are seeing are the same, but it looks like it. Just a few hours ago, one slow OSD caused a lot of problems for us. It is somehow reported down, and while cluster was trying to adjust, it said it was wrongly marked down. So it seems some pgs were stuck in peering. We restarted the OSD, cluster adjusted, after a while it is reported down again and the whole process repeated. We thought we should keep the OSD down, set noup, waited a while, with no luck, repeated. Even if there seems no hardware problem we decided to set the osd out and started recovery. Initial peering as you said seems so much resource intensive that it caused another ~10 OSDs to be reported down, which increased the number of pgs in peering, then they all said they're wrongly marked down... We already lowered all the recovery parameters, it takes about 2-3 hours now, but that doesn't make any difference in the starting phase of the recovery process which may take up to 10 min
 utes. We have RBD backed KVM instances and they are totally frozen for those 10 minutes. And if some pgs are stuck in peering, it requires manual operation (a restart is what we could come up with) before anything can actually continue working. We've found http://www.spinics.net/lists/ceph-users/msg00009.html but it doesn't offer much. We run 0.56.4.
> 
> 
> On Thu, May 2, 2013 at 4:57 PM, Andrey Korolyov <andrey@xxxxxxx> wrote:
> Hello,
> 
> Speaking of rotating-media-under-filestore case(must be most common in
> Ceph deployments), can peering be less greedy for disk operations
> without slowing down entire 'blackhole timeout', e.g. when it blocks
> client operations? I`m suffering of very long and very disk-intensive
> peering process even on relatively small reweighs on more or less
> significant commit on the underlying storage(50% are very hard to deal
> with, 10% of disk commit way more acceptable). Recovery by itself can
> be throttled low enough to not compete with I/O disk operations from
> clients but slowing peering process means freezing client` I/O for
> longer time, that`s all.
> Cuttlefish seems to do a part of disk controller` job for merging
> writes, but peering is still unacceptably long for _IOPS_-intensive
> cluster(5Mb/s and 800 IOPS on every disk during peering, despite
> controller aligning head movements, disks are 100% busy). SSD-based
> cluster which should not die under lack of IOPS, but prices for such
> thing still closer to the TrueEnterpriseStorage(tm) than any solution
> I can afford.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com