Recover from corrupted journals

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



While updating my cluster to use a 2K block size for XFS, I've run
into a couple OSDs failing to start because of corrupted journals:

=== osd.1 ===
   -10> 2013-11-12 13:40:35.388177 7f030458a7a0  1
filestore(/var/lib/ceph/osd/ceph-1) mount detected xfs
    -9> 2013-11-12 13:40:35.388194 7f030458a7a0  1
filestore(/var/lib/ceph/osd/ceph-1)  disabling 'filestore replica
fadvise' due to known issues with fadvise(DONTNEED) on xfs
    -8> 2013-11-12 13:40:49.735893 7f030458a7a0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
FIEMAP ioctl is supported and appears to work
    -7> 2013-11-12 13:40:49.735955 7f030458a7a0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
    -6> 2013-11-12 13:40:49.778879 7f030458a7a0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
syscall(SYS_syncfs, fd) fully supported
    -5> 2013-11-12 13:41:02.512202 7f030458a7a0  0
filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
    -4> 2013-11-12 13:41:05.932177 7f030458a7a0  2 journal open
/var/lib/ceph/osd/ceph-1/journal fsid
f7bde53e-458a-4398-a949-770648ddc414 fs_op_seq 2973368
    -3> 2013-11-12 13:41:05.964093 7f030458a7a0  1 journal _open
/var/lib/ceph/osd/ceph-1/journal fd 20: 1072693248 bytes, block size
4096 bytes, directio = 1, aio = 1
    -2> 2013-11-12 13:41:05.987641 7f030458a7a0  2 journal read_entry
361586688 : seq 2973370 55428 bytes
    -1> 2013-11-12 13:41:05.988024 7f030458a7a0 -1 journal Unable to
read past sequence 2973369 but header indicates the journal has
committed up through 2980190, journal is corrupt
     0> 2013-11-12 13:41:06.070833 7f030458a7a0 -1 os/FileJournal.cc:
In function 'bool FileJournal::read_entry(ceph::bufferlist&,
uint64_t&, bool*)' thread 7f030458a7a0 time 2013-11-12 13:41:05.988054
os/FileJournal.cc: 1697: FAILED assert(0)

 ceph version 0.72 (5832e2603c7db5d40b433d0953408993a9b7c217)
 1: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&,
bool*)+0xa46) [0x6d9ab6]
 2: (JournalingObjectStore::journal_replay(unsigned long)+0x325) [0x865835]
 3: (FileStore::mount()+0x2db0) [0x70e330]
 4: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x608dba]
 5: (OSD::convertfs(std::string const&, std::string const&)+0x49) [0x6097c9]
 6: (main()+0x3190) [0x5c65d0]
 7: (__libc_start_main()+0xfd) [0x3ee0e1ecdd]
 8: /usr/bin/ceph-osd() [0x5c3089]


=== osd.4 ===
   -10> 2013-11-11 16:31:52.697736 7fefe710e7a0  1
filestore(/var/lib/ceph/osd/ceph-4) mount detected xfs
    -9> 2013-11-11 16:31:52.697764 7fefe710e7a0  1
filestore(/var/lib/ceph/osd/ceph-4)  disabling 'filestore replica
fadvise' due to known issues with fadvise(DONTNEED) on xfs
    -8> 2013-11-11 16:32:06.301437 7fefe710e7a0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
FIEMAP ioctl is supported and appears to work
    -7> 2013-11-11 16:32:06.301478 7fefe710e7a0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
    -6> 2013-11-11 16:32:06.321094 7fefe710e7a0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features:
syscall(SYS_syncfs, fd) fully supported
    -5> 2013-11-11 16:32:06.642899 7fefe710e7a0  0
filestore(/var/lib/ceph/osd/ceph-4) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
    -4> 2013-11-11 16:32:10.047982 7fefe710e7a0  2 journal open
/var/lib/ceph/osd/ceph-4/journal fsid
1c68cdc3-4ba1-4711-86a2-517d32b352fa fs_op_seq 2964169
    -3> 2013-11-11 16:32:10.062596 7fefe710e7a0  1 journal _open
/var/lib/ceph/osd/ceph-4/journal fd 21: 1072693248 bytes, block size
4096 bytes, directio = 1, aio = 1
    -2> 2013-11-11 16:32:10.132954 7fefe710e7a0  2 journal read_entry
993447936 : seq 2964171 8007 bytes
    -1> 2013-11-11 16:32:10.133125 7fefe710e7a0 -1 journal Unable to
read past sequence 2964170 but header indicates the journal has
committed up through 2967854, journal is corrupt
     0> 2013-11-11 16:32:10.135432 7fefe710e7a0 -1 os/FileJournal.cc:
In function 'bool FileJournal::read_entry(ceph::bufferlist&,
uint64_t&, bool*)' thread 7fefe710e7a0 time 2013-11-11 16:32:10.133149
os/FileJournal.cc: 1697: FAILED assert(0)

 ceph version 0.72 (5832e2603c7db5d40b433d0953408993a9b7c217)
 1: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&,
bool*)+0xa46) [0x6d9ab6]
 2: (JournalingObjectStore::journal_replay(unsigned long)+0x325) [0x865835]
 3: (FileStore::mount()+0x2db0) [0x70e330]
 4: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x608dba]
 5: (OSD::convertfs(std::string const&, std::string const&)+0x49) [0x6097c9]
 6: (main()+0x3190) [0x5c65d0]
 7: (__libc_start_main()+0xfd) [0x3ee0e1ecdd]
 8: /usr/bin/ceph-osd() [0x5c3089]


What's the best way to recover from this situation?

Thanks,
Bryan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux