Re: mds daemon damaged

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

all this sounds an awful lot like:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-July/027992.html
In htat case, things started with an update to 12.2.6. Which version are you running? 

Cheers,
Oliver

Am 12.07.2018 um 23:30 schrieb Kevin:
> Sorry for the long posting but trying to cover everything
> 
> I woke up to find my cephfs filesystem down. This was in the logs
> 
> 2018-07-11 05:54:10.398171 osd.1 [ERR] 2.4 full-object read crc 0x6fc2f65a != expected 0x1c08241c on 2:292cf221:::200.00000000:head
> 
> I had one standby MDS, but as far as I can tell it did not fail over. This was in the logs
> 
> (insufficient standby MDS daemons available)
> 
> Currently my ceph looks like this
>   cluster:
>     id:     ......................
>     health: HEALTH_ERR
>             1 filesystem is degraded
>             1 mds daemon damaged
> 
>   services:
>     mon: 6 daemons, quorum ds26,ds27,ds2b,ds2a,ds28,ds29
>     mgr: ids27(active)
>     mds: test-cephfs-1-0/1/1 up , 3 up:standby, 1 damaged
>     osd: 5 osds: 5 up, 5 in
> 
>   data:
>     pools:   3 pools, 202 pgs
>     objects: 1013k objects, 4018 GB
>     usage:   12085 GB used, 6544 GB / 18630 GB avail
>     pgs:     201 active+clean
>              1   active+clean+scrubbing+deep
> 
>   io:
>     client:   0 B/s rd, 0 op/s rd, 0 op/s wr
> 
> I started trying to get the damaged MDS back online
> 
> Based on this page http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#disaster-recovery-experts
> 
> # cephfs-journal-tool journal export backup.bin
> 2018-07-12 13:35:15.675964 7f3e1389bf00 -1 Header 200.00000000 is unreadable
> 2018-07-12 13:35:15.675977 7f3e1389bf00 -1 journal_export: Journal not readable, attempt object-by-object dump with `rados`
> Error ((5) Input/output error)
> 
> # cephfs-journal-tool event recover_dentries summary
> Events by type:
> 2018-07-12 13:36:03.000590 7fc398a18f00 -1 Header 200.00000000 is unreadableErrors: 0
> 
> cephfs-journal-tool journal reset - (I think this command might have worked)
> 
> Next up, tried to reset the filesystem
> 
> ceph fs reset test-cephfs-1 --yes-i-really-mean-it
> 
> Each time same errors
> 
> 2018-07-12 11:56:35.760449 mon.ds26 [INF] Health check cleared: MDS_DAMAGE (was: 1 mds daemon damaged)
> 2018-07-12 11:56:35.856737 mon.ds26 [INF] Standby daemon mds.ds27 assigned to filesystem test-cephfs-1 as rank 0
> 2018-07-12 11:56:35.947801 mds.ds27 [ERR] Error recovering journal 0x200: (5) Input/output error
> 2018-07-12 11:56:36.900807 mon.ds26 [ERR] Health check failed: 1 mds daemon damaged (MDS_DAMAGE)
> 2018-07-12 11:56:35.945544 osd.0 [ERR] 2.4 full-object read crc 0x6fc2f65a != expected 0x1c08241c on 2:292cf221:::200.00000000:head
> 2018-07-12 12:00:00.000142 mon.ds26 [ERR] overall HEALTH_ERR 1 filesystem is degraded; 1 mds daemon damaged
> 
> Tried to 'fail' mds.ds27
> # ceph mds fail ds27
> # failed mds gid 1929168
> 
> Command worked, but each time I run the reset command the same errors above appear
> 
> Online searches say the object read error has to be removed. But there's no object listed. This web page is the closest to the issue
> http://tracker.ceph.com/issues/20863
> 
> Recommends fixing error by hand. Tried running deep scrub on pg 2.4, it completes but still have the same issue above
> 
> Final option is to attempt removing mds.ds27. If mds.ds29 was a standby and has data it should become live. If it was not
> I assume we will lose the filesystem at this point
> 
> Why didn't the standby MDS failover?
> 
> Just looking for any way to recover the cephfs, thanks!
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux