Thanks for the reply, I have had some more time to mess around more with this now. I understand that the best thing is to allow it to rebuild the entire OSD, but I am currently only using one replica and 2/3 machines had problems I ended up in a bad situation. With OSDs down on 2 machines and one replica I think I would lose data for certain if I rebuilt them from scratch. Luckily in my case there was no new data being written to the cluster at that time, I only use it as a NAS in my home-lab. It did work out fine for me this time but I guess anyone reading this should know it is not a recommended way to do things. I got confused because I was reusing a logical volume as journal and I didn´t wipe it properly before I used "--mkjournal", after wiping it properly and then using "--mkjournal" seems to have solved the problem for me. My only withstanding issue now is one pg that remains inconsistent even after trying to do a repair, besides that everything seems to be fine. I haven´t digged too much into that yet, with only one replica I guess it is ticky to guess which of the replicas that is the broken one. I will add a note to that ticket, it happened when the power to the server was lost while replicating and I think that is what made two journals corrupt. Cheers, Claes -----Original Message----- From: Sage Weil [mailto:sage@xxxxxxxxxxxx] Sent: den 12 januari 2015 15:46 To: Sahlstrom, Claes Cc: ceph-users@xxxxxxxx Subject: Re: Replace corrupt journal On Sun, 11 Jan 2015, Sahlstrom, Claes wrote: > > Hi, > > > > I have a problem starting a couple of OSDs because of the journal > being corrupt. Is there any way to replace the journal and keeping the > rest of the OSD intact. It is risky at best... I would not recommend it! The safe route is to wipe the OSD and let the cluster repair. > -1> 2015-01-11 16:02:54.475138 7fb32df86900 -1 journal Unable to > read past sequence 8188178 but header indicates the journal has > committed up through 8188206, journal is corrupt > > 0> 2015-01-11 16:02:54.479296 7fb32df86900 -1 os/FileJournal.cc: > In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' > thread 7fb32df86900 time 2015-01-11 16:02:54.475276 > > os/FileJournal.cc: 1693: FAILED assert(0) Do you mind making a note that you saw this on this ticket: http://tracker.ceph.com/issues/6003 We see it periodically in QA but have never been able to track it down. It could also be caused by a hardware issue, so any information about whether the journal device appears damanged would be helpful. Thanks! sage _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com