Re: cephfs: how to repair damaged mds rank?

Daniel Baumann <daniel.baumann@xxxxxx> · Mon, 9 Oct 2017 11:54:48 +0200

Hi John,

On 10/09/2017 10:47 AM, John Spray wrote:
> When a rank is "damaged", that means the MDS rank is blocked from
> starting because Ceph thinks the on-disk metadata is damaged -- no
> amount of restarting things will help.

thanks.

> The place to start with the investigation is to find the source of the
> damage.  Look in your monitor log for "marking rank 6 damaged"

I found this in the mon log:

  2017-10-09 03:24:28.207424 7f3290710700  0 log_channel(cluster) log
     [DBG] : mds.6 147.87.226.187:6800/1120166215 down:damaged

so at the time it was marked damaged, rank 6 was running on mds7.

> and then look in your MDS logs at that timestamp (find the MDS that held
> rank 6 at the time).

looking at mds7 log for that timespan, I think I understand that:

  * at "early" 03:24, mds7 was serving rank 5 and crashed, restarted
    automatically twice, and then picked up rank 6 at 03:24:21.

  * at 03:24:21, mds7 got rank 6 and got into 'standby'-mode(?):

2017-10-09 03:24:21.598446 7f70ca01c240  0 set uid:gid to 64045:64045
(ceph:ceph)
2017-10-09 03:24:21.598469 7f70ca01c240  0 ceph version 12.2.1
(3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process
(unknown), pid 1337
2017-10-09 03:24:21.601958 7f70ca01c240  0 pidfile_write: ignore empty
--pid-file
2017-10-09 03:24:26.108545 7f70c2580700  1 mds.mds7 handle_mds_map standby
2017-10-09 03:24:26.115469 7f70c2580700  1 mds.6.95474 handle_mds_map i
am now mds.6.95474
2017-10-09 03:24:26.115479 7f70c2580700  1 mds.6.95474 handle_mds_map
state change up:boot --> up:replay
2017-10-09 03:24:26.115493 7f70c2580700  1 mds.6.95474 replay_start
2017-10-09 03:24:26.115502 7f70c2580700  1 mds.6.95474  recovery set is
0,1,2,3,4,5,7,8
2017-10-09 03:24:26.115511 7f70c2580700  1 mds.6.95474  waiting for
osdmap 18284 (which blacklists prior instance)
2017-10-09 03:24:26.536629 7f70bc574700  0 mds.6.cache creating system
inode with ino:0x106
2017-10-09 03:24:26.537009 7f70bc574700  0 mds.6.cache creating system
inode with ino:0x1
2017-10-09 03:24:27.233759 7f70bd576700 -1 mds.6.journaler.pq(ro)
_decode error from assimilate_prefetch
2017-10-09 03:24:27.233780 7f70bd576700 -1 mds.6.purge_queue _recover:
Error -22 recovering write_pos
2017-10-09 03:24:27.238820 7f70bd576700  1 mds.mds7 respawn
2017-10-09 03:24:27.238828 7f70bd576700  1 mds.mds7  e: '/usr/bin/ceph-mds'
2017-10-09 03:24:27.238831 7f70bd576700  1 mds.mds7  0: '/usr/bin/ceph-mds'
2017-10-09 03:24:27.238833 7f70bd576700  1 mds.mds7  1: '-f'
2017-10-09 03:24:27.238835 7f70bd576700  1 mds.mds7  2: '--cluster'
2017-10-09 03:24:27.238836 7f70bd576700  1 mds.mds7  3: 'ceph'
2017-10-09 03:24:27.238838 7f70bd576700  1 mds.mds7  4: '--id'
2017-10-09 03:24:27.238839 7f70bd576700  1 mds.mds7  5: 'mds7'
2017-10-09 03:24:27.239567 7f70bd576700  1 mds.mds7  6: '--setuser'
2017-10-09 03:24:27.239579 7f70bd576700  1 mds.mds7  7: 'ceph'
2017-10-09 03:24:27.239580 7f70bd576700  1 mds.mds7  8: '--setgroup'
2017-10-09 03:24:27.239581 7f70bd576700  1 mds.mds7  9: 'ceph'
2017-10-09 03:24:27.239612 7f70bd576700  1 mds.mds7 respawning with exe
/usr/bin/ceph-mds
2017-10-09 03:24:27.239614 7f70bd576700  1 mds.mds7  exe_path /proc/self/exe
2017-10-09 03:24:27.268448 7f9c7eafa240  0 ceph version 12.2.1
(3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process
(unknown), pid 1337
2017-10-09 03:24:27.271987 7f9c7eafa240  0 pidfile_write: ignore empty
--pid-file
2017-10-09 03:24:31.325891 7f9c7789c700  1 mds.mds7 handle_mds_map standby
2017-10-09 03:24:31.332376 7f9c7789c700  1 mds.1.0 handle_mds_map i am
now mds.28178286.0 replaying mds.1.0
2017-10-09 03:24:31.332388 7f9c7789c700  1 mds.1.0 handle_mds_map state
change up:boot --> up:standby-replay
2017-10-09 03:24:31.332401 7f9c7789c700  1 mds.1.0 replay_start
2017-10-09 03:24:31.332410 7f9c7789c700  1 mds.1.0  recovery set is
0,2,3,4,5,6,7,8
2017-10-09 03:24:31.332425 7f9c7789c700  1 mds.1.0  waiting for osdmap
18285 (which blacklists prior instance)
2017-10-09 03:24:31.351850 7f9c7108f700  0 mds.1.cache creating system
inode with ino:0x101
2017-10-09 03:24:31.352204 7f9c7108f700  0 mds.1.cache creating system
inode with ino:0x1
2017-10-09 03:24:32.144505 7f9c7008d700  0 mds.1.cache creating system
inode with ino:0x100
2017-10-09 03:24:32.144671 7f9c7008d700  1 mds.1.0 replay_done (as standby)
2017-10-09 03:24:33.150117 7f9c71890700  1 mds.1.0 replay_done (as standby)

for about two hours, then, the last line repeats unchanged for every
following second.

where can I go with this? anything I can do further?

also, just in case: it seems that at the time of the crash a large (= a
lot, lot of small files) 'rm -rf' was running (all clients use kernel
4.13.4 to mount the cephfs, not fuse).

Regards,
Daniel
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com