On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo <Alessandro.DeSalvo@xxxxxxxxxxxxx> wrote: > > Hi, > > after the upgrade to luminous 12.2.6 today, all our MDSes have been > marked as damaged. Trying to restart the instances only result in > standby MDSes. We currently have 2 filesystems active and 2 MDSes each. > > I found the following error messages in the mon: > > > mds.0 <node1_IP>:6800/2412911269 down:damaged > mds.1 <node2_IP>:6800/830539001 down:damaged > mds.0 <node3_IP>:6800/4080298733 down:damaged > > > Whenever I try to force the repaired state with ceph mds repaired > <fs_name>:<rank> I get something like this in the MDS logs: > > > 2018-07-11 13:20:41.597970 7ff7e010e700 0 mds.1.journaler.mdlog(ro) > error getting journal off disk > 2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) log > [ERR] : Error recovering journal 0x201: (5) Input/output error An EIO reading the journal header is pretty scary. The MDS itself probably can't tell you much more about this: you need to dig down into the RADOS layer. Try reading the 200.00000000 object (that happens to be the rank 0 journal header, every CephFS filesystem should have one) using the `rados` command line tool. John > > > Any attempt of running the journal export results in errors, like this one: > > > cephfs-journal-tool --rank=cephfs:0 journal export backup.bin > Error ((5) Input/output error)2018-07-11 17:01:30.631571 7f94354fff00 -1 > Header 200.00000000 is unreadable > > 2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: Journal not > readable, attempt object-by-object dump with `rados` > > > Same happens for recover_dentries > > cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary > Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 Header > 200.00000000 is unreadable > Errors: > 0 > > Is there something I could try to do to have the cluster back? > > I was able to dump the contents of the metadata pool with rados export > -p cephfs_metadata <filename> and I'm currently trying the procedure > described in > http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery > but I'm not sure if it will work as it's apparently doing nothing at the > moment (maybe it's just very slow). > > Any help is appreciated, thanks! > > > Alessandro > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com