Re: how to debug (in order to repair) damaged MDS (rank)?

Daniel Baumann <daniel.baumann@xxxxxx> · Tue, 10 Oct 2017 13:30:26 +0200

Hi John,

thank you very much for your help.

On 10/10/2017 12:57 PM, John Spray wrote:
>  A) Do a "rados -p <metadata pool> ls | grep "^506\." or similar, to
> get a list of the objects

done, gives me these:

  506.00000000
  506.00000017
  506.0000001b
  506.00000019
  506.0000001a
  506.0000001c
  506.00000018
  506.00000016
  506.0000001e
  506.0000001f
  506.0000001d

>  B) Write a short bash loop to do a "rados -p <metadata pool> get" on
> each of those objects into a file.

done, saved them as the object name as filename, resulting in these 11
files:

   90 Oct 10 13:17 506.00000000
 4.0M Oct 10 13:17 506.00000016
 4.0M Oct 10 13:17 506.00000017
 4.0M Oct 10 13:17 506.00000018
 4.0M Oct 10 13:17 506.00000019
 4.0M Oct 10 13:17 506.0000001a
 4.0M Oct 10 13:17 506.0000001b
 4.0M Oct 10 13:17 506.0000001c
 4.0M Oct 10 13:17 506.0000001d
 4.0M Oct 10 13:17 506.0000001e
 4.0M Oct 10 13:17 506.0000001f

>  C) Stop the MDS, set "debug mds = 20" and "debug journaler = 20",
> mark the rank repaired, start the MDS again, and then gather the
> resulting log (it should end in the same "Error -22 recovering
> write_pos", but have much much more detail about what came before).

I've attached the entire log from right before issueing "repaired" until
after the mds drops to standby again.

> Because you've hit a serious bug, it's really important to gather all
> this and share it, so that we can try to fix it and prevent it
> happening again to you or others.

absolutely, sure. If you need anything more, I'm happy to share.

> You have two options, depending on how much downtime you can tolerate:
>  - carefully remove all the metadata objects that start with 506. --

given the outtage (and people need access to their data), I'd go with
this. Just to be safe: that would go like this?

  rados -p <metadata_pool> rm 506.00000000
  rados -p <metadata_pool> rm 506.00000016
  [...]

Regards,
Daniel
2017-10-10 13:21:55.413752 7f3f3011a700  5 mds.mds9 handle_mds_map epoch 96224 from mon.0
2017-10-10 13:21:55.413836 7f3f3011a700 10 mds.mds9      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}
2017-10-10 13:21:55.413847 7f3f3011a700 10 mds.mds9  mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
2017-10-10 13:21:55.413852 7f3f3011a700 10 mds.mds9 map says I am 147.87.226.189:6800/1634944095 mds.6.96224 state up:replay
2017-10-10 13:21:55.414088 7f3f3011a700  4 mds.6.purge_queue operator():  data pool 7 not found in OSDMap
2017-10-10 13:21:55.414141 7f3f3011a700 10 mds.mds9 handle_mds_map: initializing MDS rank 6
2017-10-10 13:21:55.414410 7f3f3011a700 10 mds.6.0 update_log_config log_to_monitors {default=true}
2017-10-10 13:21:55.414415 7f3f3011a700 10 mds.6.0 create_logger
2017-10-10 13:21:55.414635 7f3f3011a700  7 mds.6.server operator(): full = 0 epoch = 0
2017-10-10 13:21:55.414644 7f3f3011a700  4 mds.6.purge_queue operator():  data pool 7 not found in OSDMap
2017-10-10 13:21:55.414648 7f3f3011a700  4 mds.6.0 handle_osd_map epoch 0, 0 new blacklist entries
2017-10-10 13:21:55.414660 7f3f3011a700 10 mds.6.server apply_blacklist: killed 0
2017-10-10 13:21:55.414830 7f3f3011a700 10 mds.mds9 handle_mds_map: handling map as rank 6
2017-10-10 13:21:55.414839 7f3f3011a700  1 mds.6.96224 handle_mds_map i am now mds.6.96224
2017-10-10 13:21:55.414843 7f3f3011a700  1 mds.6.96224 handle_mds_map state change up:boot --> up:replay
2017-10-10 13:21:55.414855 7f3f3011a700 10 mds.beacon.mds9 set_want_state: up:standby -> up:replay
2017-10-10 13:21:55.414859 7f3f3011a700  1 mds.6.96224 replay_start
2017-10-10 13:21:55.414873 7f3f3011a700  7 mds.6.cache set_recovery_set 0,1,2,3,4,5,7,8
2017-10-10 13:21:55.414883 7f3f3011a700  1 mds.6.96224  recovery set is 0,1,2,3,4,5,7,8
2017-10-10 13:21:55.414893 7f3f3011a700  1 mds.6.96224  waiting for osdmap 18607 (which blacklists prior instance)
2017-10-10 13:21:55.414901 7f3f3011a700  4 mds.6.purge_queue operator():  data pool 7 not found in OSDMap
2017-10-10 13:21:55.416011 7f3f3011a700  7 mds.6.server operator(): full = 0 epoch = 18608
2017-10-10 13:21:55.416024 7f3f3011a700  4 mds.6.96224 handle_osd_map epoch 18608, 0 new blacklist entries
2017-10-10 13:21:55.416027 7f3f3011a700 10 mds.6.server apply_blacklist: killed 0
2017-10-10 13:21:55.416076 7f3f2a10e700 10 MDSIOContextBase::complete: 12C_IO_Wrapper
2017-10-10 13:21:55.416095 7f3f2a10e700 10 MDSInternalContextBase::complete: 15C_MDS_BootStart
2017-10-10 13:21:55.416101 7f3f2a10e700  2 mds.6.96224 boot_start 0: opening inotable
2017-10-10 13:21:55.416120 7f3f2a10e700 10 mds.6.inotable: load
2017-10-10 13:21:55.416301 7f3f2a10e700  2 mds.6.96224 boot_start 0: opening sessionmap
2017-10-10 13:21:55.416310 7f3f2a10e700 10 mds.6.sessionmap load
2017-10-10 13:21:55.416421 7f3f2a10e700  2 mds.6.96224 boot_start 0: opening mds log
2017-10-10 13:21:55.416430 7f3f2a10e700  5 mds.6.log open discovering log bounds
2017-10-10 13:21:55.416542 7f3f2990d700  4 mds.6.journalpointer Reading journal pointer '406.00000000'
2017-10-10 13:21:55.416603 7f3f2910c700 10 mds.6.log _submit_thread start
2017-10-10 13:21:55.417795 7f3f2a10e700 10 MDSIOContextBase::complete: 12C_IO_MT_Load
2017-10-10 13:21:55.417810 7f3f2a10e700 10 mds.6.inotable: load_2 got 34 bytes
2017-10-10 13:21:55.417816 7f3f2a10e700 10 mds.6.inotable: load_2 loaded v1
2017-10-10 13:21:55.417946 7f3f2a10e700 10 MDSIOContextBase::complete: N12_GLOBAL__N_112C_IO_SM_LoadE
2017-10-10 13:21:55.417953 7f3f2a10e700 10 mds.6.sessionmap _load_finish loaded version 0
2017-10-10 13:21:55.417956 7f3f2a10e700 10 mds.6.sessionmap _load_finish: omap load complete
2017-10-10 13:21:55.417958 7f3f2a10e700 10 mds.6.sessionmap _load_finish: v 0, 0 sessions
2017-10-10 13:21:55.417960 7f3f2a10e700 10 mds.6.sessionmap dump
2017-10-10 13:21:55.418287 7f3f2990d700  1 mds.6.journaler.mdlog(ro) recover start
2017-10-10 13:21:55.418298 7f3f2990d700  1 mds.6.journaler.mdlog(ro) read_head
2017-10-10 13:21:55.418406 7f3f2990d700  4 mds.6.log Waiting for journal 0x206 to recover...
2017-10-10 13:21:55.419926 7f3f2a10e700  1 mds.6.journaler.mdlog(ro) _finish_read_head loghead(trim 26013073408, expire 26013073408, write 26013073408, stream_format 1).  probing for end of log (from 26013073408)...
2017-10-10 13:21:55.419947 7f3f2a10e700  1 mds.6.journaler.mdlog(ro) probing for end of the log
2017-10-10 13:21:55.420677 7f3f2a10e700  1 mds.6.journaler.mdlog(ro) _finish_probe_end write_pos = 26013073456 (header had 26013073408). recovered.
2017-10-10 13:21:55.420740 7f3f2990d700  4 mds.6.log Journal 0x206 recovered.
2017-10-10 13:21:55.420756 7f3f2990d700  4 mds.6.log Recovered journal 0x206 in format 1
2017-10-10 13:21:55.420763 7f3f2990d700 10 MDSInternalContextBase::complete: 15C_MDS_BootStart
2017-10-10 13:21:55.420765 7f3f2990d700  2 mds.6.96224 boot_start 1: loading/discovering base inodes
2017-10-10 13:21:55.420784 7f3f2990d700  0 mds.6.cache creating system inode with ino:0x106
2017-10-10 13:21:55.420831 7f3f2990d700 10 mds.6.cache.ino(0x106) fetch
2017-10-10 13:21:55.421116 7f3f2990d700  4 mds.6.purge_queue open: opening
2017-10-10 13:21:55.421122 7f3f2990d700  1 mds.6.journaler.pq(ro) recover start
2017-10-10 13:21:55.421124 7f3f2990d700  1 mds.6.journaler.pq(ro) read_head
2017-10-10 13:21:55.421231 7f3f2990d700  0 mds.6.cache creating system inode with ino:0x1
2017-10-10 13:21:55.422532 7f3f2a10e700 10 MDSIOContextBase::complete: 18C_IO_Inode_Fetched
2017-10-10 13:21:55.422548 7f3f2a10e700 10 mds.6.cache.ino(0x106) _fetched got 0 and 536
2017-10-10 13:21:55.422556 7f3f2a10e700 10 mds.6.cache.ino(0x106)  magic is 'ceph fs volume v011' (expecting 'ceph fs volume v011')
2017-10-10 13:21:55.422584 7f3f2a10e700 10  mds.6.cache.snaprealm(0x106 seq 1 0x55b192f65c00) open_parents [1,head]
2017-10-10 13:21:55.422593 7f3f2a10e700 20 mds.6.cache.ino(0x106) decode_snap_blob snaprealm(0x106 seq 1 lc 0 cr 0 cps 1 snaps={} 0x55b192f65c00)
2017-10-10 13:21:55.422598 7f3f2a10e700 10 mds.6.cache.ino(0x106) _fetched [inode 0x106 [...2,head] ~mds6/ auth v19 snaprealm=0x55b192f65c00 f(v0 10=0+10) n(v3 rc2017-10-03 22:56:32.400835 b6253 88=11+77)/n(v0 11=0+11) (iversion lock) 0x55b193176700]
2017-10-10 13:21:55.831091 7f3f2b110700  1 mds.6.journaler.pq(ro) _finish_read_head loghead(trim 104857600, expire 108687220, write 108868115, stream_format 1).  probing for end of log (from 108868115)...
2017-10-10 13:21:55.831107 7f3f2b110700  1 mds.6.journaler.pq(ro) probing for end of the log
2017-10-10 13:21:55.841213 7f3f2b110700  1 mds.6.journaler.pq(ro) _finish_probe_end write_pos = 134217728 (header had 108868115). recovered.
2017-10-10 13:21:55.841234 7f3f2b110700  4 mds.6.purge_queue operator(): open complete
2017-10-10 13:21:55.841236 7f3f2b110700  4 mds.6.purge_queue operator(): recovering write_pos
2017-10-10 13:21:55.841239 7f3f2b110700 10 mds.6.journaler.pq(ro) _prefetch
2017-10-10 13:21:55.841241 7f3f2b110700 10 mds.6.journaler.pq(ro) _prefetch 41943040 requested_pos 108868115 < target 134217728 (150811155), prefetching 25349613
2017-10-10 13:21:55.841246 7f3f2b110700 10 mds.6.journaler.pq(ro) _issue_read reading 108868115~25349613, read pointers 108868115/108868115/134217728
2017-10-10 13:21:55.841564 7f3f2b110700 10 mds.6.journaler.pq(ro) wait_for_readable at 108868115 onreadable 0x55b193232840
2017-10-10 13:21:55.842864 7f3f2b110700 10 mds.6.journaler.pq(ro) _finish_read got 108868115~183789
2017-10-10 13:21:55.842882 7f3f2b110700 10 mds.6.journaler.pq(ro) _assimilate_prefetch 108868115~183789
2017-10-10 13:21:55.842886 7f3f2b110700 10 mds.6.journaler.pq(ro) _assimilate_prefetch read_buf now 108868115~183789, read pointers 108868115/109051904/134217728
2017-10-10 13:21:55.842965 7f3f2b110700 -1 mds.6.journaler.pq(ro) _decode error from assimilate_prefetch
2017-10-10 13:21:55.842979 7f3f2b110700 -1 mds.6.purge_queue _recover: Error -22 recovering write_pos
2017-10-10 13:21:55.842983 7f3f2b110700 10 mds.beacon.mds9 set_want_state: up:replay -> down:damaged
2017-10-10 13:21:55.842990 7f3f2b110700 20 mds.beacon.mds9 0 slow request found
2017-10-10 13:21:55.843019 7f3f2b110700 10 mds.beacon.mds9 _send down:damaged seq 33
2017-10-10 13:21:55.843059 7f3f2b110700 20 mds.beacon.mds9 send_and_wait: awaiting 33 for up to 5s
2017-10-10 13:21:55.860786 7f3f3011a700 10 mds.beacon.mds9 handle_mds_beacon down:damaged seq 33 rtt 0.017763
2017-10-10 13:21:55.860818 7f3f2b110700  1 mds.mds9 respawn
2017-10-10 13:21:55.860824 7f3f2b110700  1 mds.mds9  e: 'ceph-mds'
2017-10-10 13:21:55.860826 7f3f2b110700  1 mds.mds9  0: 'ceph-mds'
2017-10-10 13:21:55.860829 7f3f2b110700  1 mds.mds9  1: '-i'
2017-10-10 13:21:55.860831 7f3f2b110700  1 mds.mds9  2: 'mds9'
2017-10-10 13:21:55.860834 7f3f2b110700  1 mds.mds9  3: '-f'
2017-10-10 13:21:55.860863 7f3f2b110700  1 mds.mds9  4: '--debug-mds=20'
2017-10-10 13:21:55.860864 7f3f2b110700  1 mds.mds9  5: '--debug-journaler=20'
2017-10-10 13:21:55.860911 7f3f2b110700  1 mds.mds9 respawning with exe /usr/bin/ceph-mds
2017-10-10 13:21:55.860919 7f3f2b110700  1 mds.mds9  exe_path /proc/self/exe
2017-10-10 13:21:55.887868 7f1f31970240  0 ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process (unknown), pid 3286
2017-10-10 13:21:55.891577 7f1f31970240  0 pidfile_write: ignore empty --pid-file
2017-10-10 13:21:55.894535 7f1f31970240 10 mds.mds9 112 MDSCacheObject
2017-10-10 13:21:55.894541 7f1f31970240 10 mds.mds9 1624        CInode
2017-10-10 13:21:55.894543 7f1f31970240 10 mds.mds9 16   elist<>::item   *7=112
2017-10-10 13:21:55.894546 7f1f31970240 10 mds.mds9 504  inode_t
2017-10-10 13:21:55.894548 7f1f31970240 10 mds.mds9 48    nest_info_t
2017-10-10 13:21:55.894549 7f1f31970240 10 mds.mds9 40    frag_info_t
2017-10-10 13:21:55.894551 7f1f31970240 10 mds.mds9 40   SimpleLock   *5=200
2017-10-10 13:21:55.894553 7f1f31970240 10 mds.mds9 48   ScatterLock  *3=144
2017-10-10 13:21:55.894554 7f1f31970240 10 mds.mds9 456 CDentry
2017-10-10 13:21:55.894556 7f1f31970240 10 mds.mds9 16   elist<>::item
2017-10-10 13:21:55.894557 7f1f31970240 10 mds.mds9 40   SimpleLock
2017-10-10 13:21:55.894558 7f1f31970240 10 mds.mds9 824 CDir
2017-10-10 13:21:55.894559 7f1f31970240 10 mds.mds9 16   elist<>::item   *2=32
2017-10-10 13:21:55.894561 7f1f31970240 10 mds.mds9 232  fnode_t
2017-10-10 13:21:55.894562 7f1f31970240 10 mds.mds9 48    nest_info_t *2
2017-10-10 13:21:55.894563 7f1f31970240 10 mds.mds9 40    frag_info_t *2
2017-10-10 13:21:55.894565 7f1f31970240 10 mds.mds9 272 Capability
2017-10-10 13:21:55.894566 7f1f31970240 10 mds.mds9 32   xlist<>::item   *2=64
2017-10-10 13:21:55.895560 7f1f2ceef700 10 mds.mds9 MDSDaemon::ms_get_authorizer type=mon
2017-10-10 13:21:55.899616 7f1f31970240 10 mds.beacon.mds9 _send up:boot seq 1
2017-10-10 13:21:55.899932 7f1f2a712700  5 mds.mds9 handle_mds_map epoch 96224 from mon.0
2017-10-10 13:21:55.899982 7f1f2a712700 10 mds.mds9      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}
2017-10-10 13:21:55.899992 7f1f2a712700 10 mds.mds9  mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
2017-10-10 13:21:55.899999 7f1f2a712700 10 mds.mds9 map says I am 147.87.226.189:6800/3326997988 mds.-1.-1 state ???
2017-10-10 13:21:55.900004 7f1f2a712700 10 mds.mds9 handle_mds_map: handling map in rankless mode
2017-10-10 13:21:55.900007 7f1f2a712700 10 mds.mds9 not in map yet
2017-10-10 13:21:55.900583 7f1f2c6ee700 10 mds.mds9 MDSDaemon::ms_get_authorizer type=mgr
2017-10-10 13:21:56.459839 7f1f2a712700  5 mds.mds9 handle_mds_map epoch 96225 from mon.0
2017-10-10 13:21:56.459868 7f1f2a712700 10 mds.mds9      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}
2017-10-10 13:21:56.459874 7f1f2a712700 10 mds.mds9  mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
2017-10-10 13:21:56.459877 7f1f2a712700 10 mds.mds9 map says I am 147.87.226.189:6800/3326997988 mds.-1.-1 state ???
2017-10-10 13:21:56.459882 7f1f2a712700 10 mds.mds9 handle_mds_map: handling map in rankless mode
2017-10-10 13:21:56.459884 7f1f2a712700 10 mds.mds9 not in map yet
2017-10-10 13:21:59.899754 7f1f2770c700 10 mds.beacon.mds9 _send up:boot seq 2
2017-10-10 13:22:00.622000 7f1f2a712700  5 mds.mds9 handle_mds_map epoch 96226 from mon.0
2017-10-10 13:22:00.622048 7f1f2a712700 10 mds.mds9      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}
2017-10-10 13:22:00.622055 7f1f2a712700 10 mds.mds9  mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
2017-10-10 13:22:00.622060 7f1f2a712700 10 mds.mds9 map says I am 147.87.226.189:6800/3326997988 mds.-1.0 state up:standby
2017-10-10 13:22:00.622067 7f1f2a712700 10 mds.mds9 handle_mds_map: handling map in rankless mode
2017-10-10 13:22:00.622082 7f1f2a712700 10 mds.beacon.mds9 set_want_state: up:boot -> up:standby
2017-10-10 13:22:00.622089 7f1f2a712700  1 mds.mds9 handle_mds_map standby
2017-10-10 13:22:00.622250 7f1f2a712700 10 mds.beacon.mds9 handle_mds_beacon up:boot seq 2 rtt 0.722471
2017-10-10 13:22:03.899866 7f1f2770c700 10 mds.beacon.mds9 _send up:standby seq 3
2017-10-10 13:22:03.900760 7f1f2a712700 10 mds.beacon.mds9 handle_mds_beacon up:standby seq 3 rtt 0.000866
2017-10-10 13:22:07.899968 7f1f2770c700 10 mds.beacon.mds9 _send up:standby seq 4
2017-10-10 13:22:07.900735 7f1f2a712700 10 mds.beacon.mds9 handle_mds_beacon up:standby seq 4 rtt 0.000743
2017-10-10 13:22:11.900072 7f1f2770c700 10 mds.beacon.mds9 _send up:standby seq 5
2017-10-10 13:22:11.900898 7f1f2a712700 10 mds.beacon.mds9 handle_mds_beacon up:standby seq 5 rtt 0.000799
2017-10-10 13:22:15.900170 7f1f2770c700 10 mds.beacon.mds9 _send up:standby seq 6
2017-10-10 13:22:15.900927 7f1f2a712700 10 mds.beacon.mds9 handle_mds_beacon up:standby seq 6 rtt 0.000737
2017-10-10 13:22:19.900296 7f1f2770c700 10 mds.beacon.mds9 _send up:standby seq 7
2017-10-10 13:22:19.901092 7f1f2a712700 10 mds.beacon.mds9 handle_mds_beacon up:standby seq 7 rtt 0.000759
2017-10-10 13:22:23.900398 7f1f2770c700 10 mds.beacon.mds9 _send up:standby seq 8
2017-10-10 13:22:23.901256 7f1f2a712700 10 mds.beacon.mds9 handle_mds_beacon up:standby seq 8 rtt 0.000830
2017-10-10 13:22:27.900492 7f1f2770c700 10 mds.beacon.mds9 _send up:standby seq 9
2017-10-10 13:22:27.901297 7f1f2a712700 10 mds.beacon.mds9 handle_mds_beacon up:standby seq 9 rtt 0.000784
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com