Re: Replacing a failed disk/OSD: unfound object

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Tommi. I rebuilt the ceph cluster a few times just to reproduce the situation. The result seems mixed, more likely btrfs failed (after power reset). But it does happen anyway.

The big question is: However rare, unfound object situation makes the *entire* ceph file system not mountable, leads to a total lost of all data. That is quite a risk to take for production system. Is there a way to recover from such situation ?(e.g. remove the file associated with the missing object)


On Fri, 8 Jul 2011 10:08:46 -0700, Tommi Virtanen wrote:
[It seems I dropped the Cc: to ceph-devel, added it back.. Please
reply to this message instead, and sorry about that. I'm starting to
dislike Google Apps for mailing list traffic :( ]

On Fri, Jul 8, 2011 at 10:07, Tommi Virtanen
<tommi.virtanen@xxxxxxxxxxxxx> wrote:
On Fri, Jul 8, 2011 at 01:23, Meng Zhao <mzhao@xxxxxxxxxxxx> wrote:
I was trying to replace a disk for an osd by following instruction at:
http://ceph.newdream.net/wiki/Replacing_a_failed_disk/OSD

Now, ceph -w getting
2011-07-08 15:52:39.702881    pg v1602: 602 pgs: 49 active+degraded, 553 active+clean+degraded; 349 MB data, 333 MB used, 566 MB / 1023 MB avail;
167/224 degraded (74.554%); 55/112 unfound (49.107%)

and a copy operation hang on the ceph client forever. I cannot kill (-9) the
cp process. Is there any hope to recover my ceph filesystem?

I'm pretty sure that the cp is hanging because Ceph chooses to wait in case the unfound objects do come back (e.g. an OSD comes back online).

Now, in the default configuration, losing a single OSD should not have
caused unfound objects in the first place. Can you provide any more
information on how you got to that point?

My general question is: How are objects distributed among OSDs? Does duplication (2x) guarantee that a failure of a single OSD would not lose data?  It appears to me that the objects are statistically redistributed and does not guarantee physical separation of replication data location.

The CRUSH logic has a special case for that: when picking replicas, if
it would pick an already used bucket, it tries again.

My understanding is that with the default crushmap, replicas will go
to different OSDs, and this is what I see in practice. If you did
construct your crushmap, it's possible that you made one that allows
multiple replicas on the same drive. Or even configured Ceph to
maintain no replicas.

It is also possible you got hit by a bug, most likely one not in the
placement rules but in the OSDs code.

Can you provide more information on your setup and the steps you took?

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux