Re: MDS damaged

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 11 Jul 2018 08:22:04 -0700

Have you checked the actual journal objects as the "journal export" suggested? Did you identify any actual source of the damage before issuing the "repaired" command?What is the history of the filesystems on this cluster?

On Wed, Jul 11, 2018 at 8:10 AM Alessandro De Salvo <Alessandro.DeSalvo@xxxxxxxxxxxxx> wrote:
Hi,

after the upgrade to luminous 12.2.6 today, all our MDSes have been 

marked as damaged. Trying to restart the instances only result in 

standby MDSes. We currently have 2 filesystems active and 2 MDSes each.

I found the following error messages in the mon:

mds.0 <node1_IP>:6800/2412911269 down:damaged

mds.1 <node2_IP>:6800/830539001 down:damaged

mds.0 <node3_IP>:6800/4080298733 down:damaged

Whenever I try to force the repaired state with ceph mds repaired 

<fs_name>:<rank> I get something like this in the MDS logs:

2018-07-11 13:20:41.597970 7ff7e010e700  0 mds.1.journaler.mdlog(ro) 

error getting journal off disk

2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) log 

[ERR] : Error recovering journal 0x201: (5) Input/output error

Any attempt of running the journal export results in errors, like this one:

cephfs-journal-tool --rank=cephfs:0 journal export backup.bin

Error ((5) Input/output error)2018-07-11 17:01:30.631571 7f94354fff00 -1 

Header 200.00000000 is unreadable

2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: Journal not 

readable, attempt object-by-object dump with `rados`

Same happens for recover_dentries

cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary

Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 Header 

200.00000000 is unreadable

Errors:

0

Is there something I could try to do to have the cluster back?

I was able to dump the contents of the metadata pool with rados export 

-p cephfs_metadata <filename> and I'm currently trying the procedure 

described in 

http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery 

but I'm not sure if it will work as it's apparently doing nothing at the 

moment (maybe it's just very slow).

Any help is appreciated, thanks!

     Alessandro

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com