hi could you tell the reason, why 'the journal is lost, the OSD is lost'? if journal is lost, actually it only lost part which ware not replayed. let take a similar case as example, a osd is down for some time , its journal is out of date(lose part of journal), but it can catch up with other osds. why? that example can tell that either outdated osd can get all journal from others or 'catch up' has different theory with journal. could you explain? thanks At 2014-08-14 05:21:20, "Craig Lewis" <clewis at centraldesktop.com> wrote: If the journal is lost, the OSD is lost. This can be a problem if you use 1 SSD for journals for many OSDs. There has been some discussion about making the OSDs able to recover from a lost journal, but I haven't heard anything else about it. I haven't been paying much attention to the developer mailing list though. For your second question, I'd start by looking at the source code in src/osd/ReplicatedPG.cc (for standard replication), or src/osd/ECBackend.cc (for Erasure Coding). I'm not a Ceph developer though, so that might not be the right place to start. On Tue, Aug 12, 2014 at 7:08 PM, yuelongguang <fastsync at 163.com> wrote: hi,all 1. can osd start up if journal is lost and it has not been replayed? 2. how it catchs up latest epoch? take osd as example, where is the code? it better you consider journal is lost or not. in my mind journal only includes meta/R/W operations, does not include data(file data). thanks _______________________________________________ ceph-users mailing list ceph-users at lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140815/7913ef82/attachment.htm>