Re: cephfs: how to repair damaged mds rank?

John Spray <jspray@xxxxxxxxxx> · Mon, 9 Oct 2017 09:47:04 +0100

On Mon, Oct 9, 2017 at 8:17 AM, Daniel Baumann <daniel.baumann@xxxxxx> wrote:
> Hi all,
>
> we have a Ceph Cluster (12.2.1) with 9 MDS ranks in multi-mds mode.
>
> "out of the blue", rank 6 is marked as damaged (and all other MDS are in
> state up:resolve) and I can't bring the FS up again.
>
> 'ceph -s' says:
> [...]
>             1 filesystem is degraded
>             1 mds daemon damaged
>
>     mds: cephfs-8/9/9 up
> {0=mds1=up:resolve,1=mds2=up:resolve,2=mds3=up:resolve,3=mds4=up:resolve,4=mds5=up:resolve,5=mds6=up:resolve,7=mds7=
> up:resolve,8=mds8=up:resolve}, 1 up:standby, 1 damaged
> [...]
>
> 'ceph fs get cephfs' says:
> [...]
> max_mds 9
> in      0,1,2,3,4,5,6,7,8
> up
> {0=28309098,1=28309128,2=28309149,3=28309188,4=28309209,5=28317918,7=28311732,8=28312272}
> failed
> damaged 6
> stopped
> [...]
> 28309098:       147.87.226.60:6800/2627352929 'mds1' mds.0.95936
> up:resolve seq 3
> 28309128:       147.87.226.61:6800/416822271 'mds2' mds.1.95939
> up:resolve seq 3
> 28309149:       147.87.226.62:6800/1969015920 'mds3' mds.2.95942
> up:resolve seq 3
> 28309188:       147.87.226.184:6800/4074580566 'mds4' mds.3.95945
> up:resolve seq 3
> 28309209:       147.87.226.185:6800/805082194 'mds5' mds.4.95948
> up:resolve seq 3
> 28317918:       147.87.226.186:6800/1913199036 'mds6' mds.5.95984
> up:resolve seq 3
> 28311732:       147.87.226.187:6800/4117561729 'mds7' mds.7.95957
> up:resolve seq 3
> 28312272:       147.87.226.188:6800/2936268159 'mds8' mds.8.95960
> up:resolve seq 3
>
>
> I think I've tried almost anything already without success :(, including:
>
>   * stopping all MDS, and bringing them up one after one
>     (works nice for the first ones up to rank 5, then the next one
>      just grabs rank 7 and no MDS after that wants to take rank 6)
>
>   * stopped all MDS, flushed MDS journal, manually marked rank 6 as
>     repaired, started all MDS again.
>
>   * tried to switch back to only one MDS (stopping all MDS, setting
>     max_mds=1, disallowing multi-mds, disallowing dirfrag, removing
>     "mds_bal_frag=true" from ceph.conf, then starting the first mds),
>     didn't work.. the one single MDS stayed in up:resolve forever.
>
>   * during all of the above, all CephFS clients have been unmounted,
>     so there's no access/stale access to the FS
>
>   * did find a few things in the mailinglist archive, but seems there's
>     nothing conclusive on how to get it back online ("formating" the
>     FS is not possible). I didn't dare trying 'ceph mds rmfailed 6'
>     in fear of dataloss.
>
>
> How can I get it back online?

When a rank is "damaged", that means the MDS rank is blocked from
starting because Ceph thinks the on-disk metadata is damaged -- no
amount of restarting things will help.

The place to start with the investigation is to find the source of the
damage.  Look in your monitor log for "marking rank 6 damaged", and
then look in your MDS logs at that timestamp (find the MDS that held
rank 6 at the time).

John

> The relevant portion from the ceph-mds log (when starting mds9 which
> should then take up rank 6; I'm happy to provide any logs):
>
> ---snip---
> 2017-10-09 08:55:56.418237 7f1ec6ef3240  0 ceph version 12.2.1
> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process
> (unknown), pid 421
> 2017-10-09 08:55:56.421672 7f1ec6ef3240  0 pidfile_write: ignore empty
> --pid-file
> 2017-10-09 08:56:00.990530 7f1ebf457700  1 mds.mds9 handle_mds_map standby
> 2017-10-09 08:56:00.997044 7f1ebf457700  1 mds.6.95988 handle_mds_map i
> am now mds.6.95988
> 2017-10-09 08:56:00.997053 7f1ebf457700  1 mds.6.95988 handle_mds_map
> state change up:boot --> up:replay
> 2017-10-09 08:56:00.997068 7f1ebf457700  1 mds.6.95988 replay_start
> 2017-10-09 08:56:00.997076 7f1ebf457700  1 mds.6.95988  recovery set is
> 0,1,2,3,4,5,7,8
> 2017-10-09 08:56:01.003203 7f1eb8c4a700  0 mds.6.cache creating system
> inode with ino:0x106
> 2017-10-09 08:56:01.003592 7f1eb8c4a700  0 mds.6.cache creating system
> inode with ino:0x1
> 2017-10-09 08:56:01.016403 7f1eba44d700 -1 mds.6.journaler.pq(ro)
> _decode error from assimilate_prefetch
> 2017-10-09 08:56:01.016425 7f1eba44d700 -1 mds.6.purge_queue _recover:
> Error -22 recovering write_pos
> 2017-10-09 08:56:01.019746 7f1eba44d700  1 mds.mds9 respawn
> 2017-10-09 08:56:01.019762 7f1eba44d700  1 mds.mds9  e: '/usr/bin/ceph-mds'
> 2017-10-09 08:56:01.019765 7f1eba44d700  1 mds.mds9  0: '/usr/bin/ceph-mds'
> 2017-10-09 08:56:01.019767 7f1eba44d700  1 mds.mds9  1: '-f'
> 2017-10-09 08:56:01.019769 7f1eba44d700  1 mds.mds9  2: '--cluster'
> 2017-10-09 08:56:01.019771 7f1eba44d700  1 mds.mds9  3: 'ceph'
> 2017-10-09 08:56:01.019772 7f1eba44d700  1 mds.mds9  4: '--id'
> 2017-10-09 08:56:01.019773 7f1eba44d700  1 mds.mds9  5: 'mds9'
> 2017-10-09 08:56:01.019774 7f1eba44d700  1 mds.mds9  6: '--setuser'
> 2017-10-09 08:56:01.019775 7f1eba44d700  1 mds.mds9  7: 'ceph'
> 2017-10-09 08:56:01.019776 7f1eba44d700  1 mds.mds9  8: '--setgroup'
> 2017-10-09 08:56:01.019778 7f1eba44d700  1 mds.mds9  9: 'ceph'
> 2017-10-09 08:56:01.019811 7f1eba44d700  1 mds.mds9 respawning with exe
> /usr/bin/ceph-mds
> 2017-10-09 08:56:01.019814 7f1eba44d700  1 mds.mds9  exe_path /proc/self/exe
> 2017-10-09 08:56:01.046396 7f5ed6090240  0 ceph version 12.2.1
> (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process
> (unknown), pid 421
> 2017-10-09 08:56:01.049516 7f5ed6090240  0 pidfile_write: ignore empty
> --pid-file
> 2017-10-09 08:56:05.162732 7f5ecee32700  1 mds.mds9 handle_mds_map standby
> [...]
> ---snap---
>
> Regards,
> Daniel
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com