[It seems I dropped the Cc: to ceph-devel, added it back.. Please reply to this message instead, and sorry about that. I'm starting to dislike Google Apps for mailing list traffic :( ] On Fri, Jul 8, 2011 at 10:07, Tommi Virtanen <tommi.virtanen@xxxxxxxxxxxxx> wrote: > On Fri, Jul 8, 2011 at 01:23, Meng Zhao <mzhao@xxxxxxxxxxxx> wrote: >> I was trying to replace a disk for an osd by following instruction at: >> http://ceph.newdream.net/wiki/Replacing_a_failed_disk/OSD >> >> Now, ceph -w getting >> 2011-07-08 15:52:39.702881 pg v1602: 602 pgs: 49 active+degraded, 553 >> active+clean+degraded; 349 MB data, 333 MB used, 566 MB / 1023 MB avail; >> 167/224 degraded (74.554%); 55/112 unfound (49.107%) >> >> and a copy operation hang on the ceph client forever. I cannot kill (-9) the >> cp process. Is there any hope to recover my ceph filesystem? > > I'm pretty sure that the cp is hanging because Ceph chooses to wait in > case the unfound objects do come back (e.g. an OSD comes back online). > > Now, in the default configuration, losing a single OSD should not have > caused unfound objects in the first place. Can you provide any more > information on how you got to that point? > >> My general question is: How are objects distributed among OSDs? Does >> duplication (2x) guarantee that a failure of a single OSD would not lose >> data? It appears to me that the objects are statistically redistributed and >> does not guarantee physical separation of replication data location. > > The CRUSH logic has a special case for that: when picking replicas, if > it would pick an already used bucket, it tries again. > > My understanding is that with the default crushmap, replicas will go > to different OSDs, and this is what I see in practice. If you did > construct your crushmap, it's possible that you made one that allows > multiple replicas on the same drive. Or even configured Ceph to > maintain no replicas. > > It is also possible you got hit by a bug, most likely one not in the > placement rules but in the OSDs code. > > Can you provide more information on your setup and the steps you took? > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html