While updating my cluster to use a 2K block size for XFS, I've run into a couple OSDs failing to start because of corrupted journals: === osd.1 === -10> 2013-11-12 13:40:35.388177 7f030458a7a0 1 filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs -9> 2013-11-12 13:40:35.388194 7f030458a7a0 1 filestore(/var/lib/ceph/osd/ceph-1) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs -8> 2013-11-12 13:40:49.735893 7f030458a7a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: FIEMAP ioctl is supported and appears to work -7> 2013-11-12 13:40:49.735955 7f030458a7a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option -6> 2013-11-12 13:40:49.778879 7f030458a7a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: syscall(SYS_syncfs, fd) fully supported -5> 2013-11-12 13:41:02.512202 7f030458a7a0 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled -4> 2013-11-12 13:41:05.932177 7f030458a7a0 2 journal open /var/lib/ceph/osd/ceph-1/journal fsid f7bde53e-458a-4398-a949-770648ddc414 fs_op_seq 2973368 -3> 2013-11-12 13:41:05.964093 7f030458a7a0 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 20: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1 -2> 2013-11-12 13:41:05.987641 7f030458a7a0 2 journal read_entry 361586688 : seq 2973370 55428 bytes -1> 2013-11-12 13:41:05.988024 7f030458a7a0 -1 journal Unable to read past sequence 2973369 but header indicates the journal has committed up through 2980190, journal is corrupt 0> 2013-11-12 13:41:06.070833 7f030458a7a0 -1 os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7f030458a7a0 time 2013-11-12 13:41:05.988054 os/FileJournal.cc: 1697: FAILED assert(0) ceph version 0.72 (5832e2603c7db5d40b433d0953408993a9b7c217) 1: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0xa46) [0x6d9ab6] 2: (JournalingObjectStore::journal_replay(unsigned long)+0x325) [0x865835] 3: (FileStore::mount()+0x2db0) [0x70e330] 4: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x608dba] 5: (OSD::convertfs(std::string const&, std::string const&)+0x49) [0x6097c9] 6: (main()+0x3190) [0x5c65d0] 7: (__libc_start_main()+0xfd) [0x3ee0e1ecdd] 8: /usr/bin/ceph-osd() [0x5c3089] === osd.4 === -10> 2013-11-11 16:31:52.697736 7fefe710e7a0 1 filestore(/var/lib/ceph/osd/ceph-4) mount detected xfs -9> 2013-11-11 16:31:52.697764 7fefe710e7a0 1 filestore(/var/lib/ceph/osd/ceph-4) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs -8> 2013-11-11 16:32:06.301437 7fefe710e7a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: FIEMAP ioctl is supported and appears to work -7> 2013-11-11 16:32:06.301478 7fefe710e7a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option -6> 2013-11-11 16:32:06.321094 7fefe710e7a0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: syscall(SYS_syncfs, fd) fully supported -5> 2013-11-11 16:32:06.642899 7fefe710e7a0 0 filestore(/var/lib/ceph/osd/ceph-4) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled -4> 2013-11-11 16:32:10.047982 7fefe710e7a0 2 journal open /var/lib/ceph/osd/ceph-4/journal fsid 1c68cdc3-4ba1-4711-86a2-517d32b352fa fs_op_seq 2964169 -3> 2013-11-11 16:32:10.062596 7fefe710e7a0 1 journal _open /var/lib/ceph/osd/ceph-4/journal fd 21: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1 -2> 2013-11-11 16:32:10.132954 7fefe710e7a0 2 journal read_entry 993447936 : seq 2964171 8007 bytes -1> 2013-11-11 16:32:10.133125 7fefe710e7a0 -1 journal Unable to read past sequence 2964170 but header indicates the journal has committed up through 2967854, journal is corrupt 0> 2013-11-11 16:32:10.135432 7fefe710e7a0 -1 os/FileJournal.cc: In function 'bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)' thread 7fefe710e7a0 time 2013-11-11 16:32:10.133149 os/FileJournal.cc: 1697: FAILED assert(0) ceph version 0.72 (5832e2603c7db5d40b433d0953408993a9b7c217) 1: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0xa46) [0x6d9ab6] 2: (JournalingObjectStore::journal_replay(unsigned long)+0x325) [0x865835] 3: (FileStore::mount()+0x2db0) [0x70e330] 4: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x608dba] 5: (OSD::convertfs(std::string const&, std::string const&)+0x49) [0x6097c9] 6: (main()+0x3190) [0x5c65d0] 7: (__libc_start_main()+0xfd) [0x3ee0e1ecdd] 8: /usr/bin/ceph-osd() [0x5c3089] What's the best way to recover from this situation? Thanks, Bryan _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com