I'm not sure if the problems we are seeing are the same, but it looks like it. Just a few hours ago, one slow OSD caused a lot of problems for us. It is somehow reported down, and while cluster was trying to adjust, it said it was wrongly marked down. So it seems some pgs were stuck in peering. We restarted the OSD, cluster adjusted, after a while it is reported down again and the whole process repeated. We thought we should keep the OSD down, set noup, waited a while, with no luck, repeated. Even if there seems no hardware problem we decided to set the osd out and started recovery. Initial peering as you said seems so much resource intensive that it caused another ~10 OSDs to be reported down, which increased the number of pgs in peering, then they all said they're wrongly marked down... We already lowered all the recovery parameters, it takes about 2-3 hours now, but that doesn't make any difference in the starting phase of the recovery process which may take up to 10 minutes. We have RBD backed KVM instances and they are totally frozen for those 10 minutes. And if some pgs are stuck in peering, it requires manual operation (a restart is what we could come up with) before anything can actually continue working. We've found http://www.spinics.net/lists/ceph-users/msg00009.html but it doesn't offer much. We run 0.56.4.
On Thu, May 2, 2013 at 4:57 PM, Andrey Korolyov <andrey@xxxxxxx> wrote:
Hello,
Speaking of rotating-media-under-filestore case(must be most common in
Ceph deployments), can peering be less greedy for disk operations
without slowing down entire 'blackhole timeout', e.g. when it blocks
client operations? I`m suffering of very long and very disk-intensive
peering process even on relatively small reweighs on more or less
significant commit on the underlying storage(50% are very hard to deal
with, 10% of disk commit way more acceptable). Recovery by itself can
be throttled low enough to not compete with I/O disk operations from
clients but slowing peering process means freezing client` I/O for
longer time, that`s all.
Cuttlefish seems to do a part of disk controller` job for merging
writes, but peering is still unacceptably long for _IOPS_-intensive
cluster(5Mb/s and 800 IOPS on every disk during peering, despite
controller aligning head movements, disks are 100% busy). SSD-based
cluster which should not die under lack of IOPS, but prices for such
thing still closer to the TrueEnterpriseStorage(tm) than any solution
I can afford.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
erdem agaoglu
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com