Re: rbd export from corrupted cluster

Christian Brunner <chb@xxxxxx> · Tue, 3 May 2011 21:19:54 +0200

2011/5/2 Sage Weil <sage@xxxxxxxxxxxx>:
> On Mon, 2 May 2011, Christian Brunner wrote:
>> after a series of hardware defects, I have a corrupted ceph cluster:
>>
>> 2011-05-02 18:12:31.038446    pg v8171648: 3712 pgs: 26 active, 3663
>> active+clean, 5 crashed+peering, 18 active+clean+inconsistent; 547 GB
>> data, 388 GB used, 51922 GB / 78245 GB avail; 2410/284300 degraded
>> (0.848%)
>>
>> Now I wanted to export an rbd-image with "rbd export" and run a
>> filesystem-check on the image. The only problem is, that the export is
>> blocking on the first corrupted object. I think it would be better to
>> detect the failure and return some blocks filled with zero.
>>
>> Is there a way to accomplish this?
>
> Not currently.  There are a couple ways to approach it.
>
> One would be to a timeout (either in librados or in the rados tool) so
> that it can move past unresponsive blocks (or error out).  Maybe a 'skip
> this block range' option would be a piece of that.
>
> The other is to give the client explicit feedback when the pg it is
> attempting to access is not available (in your case, it's the peering pgs
> that are blocking progress).  Currently those requests block at the OSD
> until peering completes, but a peering bug is preventing progress.
>
> Of course, we also need to fix the peering issue itself (any logs you can
> provide showing it blocking would help).  If you ceph pg dump and look at
> which ones are in peering, and restart those osds with full logs, we can
> see where things are getting hung up.

We had a rather old version running (0.24.1). I don't think that
debuging it makes much sense.
I have updated to 0.27 now.

> Probably, though, we still want a way to do useful work
> (partial/incomplete export) even when things are half-broken.

This was my intent, when I wrote the email.

In a larger cluster the probability of loosing multiple disks at a
time increases. The amount of data you loose, when it happens is
minimal, but since the rbd images are striped among many disks,
chances are that you loose a single block in many images.

Christian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html