Re: cephfs: how to repair damaged mds rank?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/09/2017 09:17 AM, Daniel Baumann wrote:
> The relevant portion from the ceph-mds log (when starting mds9 which
> should then take up rank 6; I'm happy to provide any logs):

i've turned up the logging (see attachment).. could it be that we hit
this bug here?

http://tracker.ceph.com/issues/17670

Regards,
Daniel
2017-10-09 10:07:14.677308 7f7972bd6700 10 mds.beacon.mds9 handle_mds_beacon up:standby seq 6 rtt 0.000642
2017-10-09 10:07:15.547453 7f7972bd6700  5 mds.mds9 handle_mds_map epoch 96022 from mon.0
2017-10-09 10:07:15.547526 7f7972bd6700 10 mds.mds9      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in om
ap,7=mds uses inline data,8=file layout v2}
2017-10-09 10:07:15.547546 7f7972bd6700 10 mds.mds9  mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in om
ap,8=file layout v2}
2017-10-09 10:07:15.547555 7f7972bd6700 10 mds.mds9 map says I am 147.87.226.189:6800/6621615 mds.6.96022 state up:replay
2017-10-09 10:07:15.547825 7f7972bd6700  4 mds.6.purge_queue operator():  data pool 7 not found in OSDMap
2017-10-09 10:07:15.547882 7f7972bd6700 10 mds.mds9 handle_mds_map: initializing MDS rank 6
2017-10-09 10:07:15.548165 7f7972bd6700 10 mds.6.0 update_log_config log_to_monitors {default=true}
2017-10-09 10:07:15.548171 7f7972bd6700 10 mds.6.0 create_logger
2017-10-09 10:07:15.548410 7f7972bd6700  7 mds.6.server operator(): full = 0 epoch = 0
2017-10-09 10:07:15.548423 7f7972bd6700  4 mds.6.purge_queue operator():  data pool 7 not found in OSDMap
2017-10-09 10:07:15.548427 7f7972bd6700  4 mds.6.0 handle_osd_map epoch 0, 0 new blacklist entries
2017-10-09 10:07:15.548439 7f7972bd6700 10 mds.6.server apply_blacklist: killed 0
2017-10-09 10:07:15.548634 7f7972bd6700 10 mds.mds9 handle_mds_map: handling map as rank 6
2017-10-09 10:07:15.548647 7f7972bd6700  1 mds.6.96022 handle_mds_map i am now mds.6.96022
2017-10-09 10:07:15.548650 7f7972bd6700  1 mds.6.96022 handle_mds_map state change up:boot --> up:replay
2017-10-09 10:07:15.548668 7f7972bd6700 10 mds.beacon.mds9 set_want_state: up:standby -> up:replay
2017-10-09 10:07:15.548687 7f7972bd6700  1 mds.6.96022 replay_start
2017-10-09 10:07:15.548699 7f7972bd6700  7 mds.6.cache set_recovery_set 0,1,2,3,4,5,7,8
2017-10-09 10:07:15.548706 7f7972bd6700  1 mds.6.96022  recovery set is 0,1,2,3,4,5,7,8
2017-10-09 10:07:15.548720 7f7972bd6700  1 mds.6.96022  waiting for osdmap 18484 (which blacklists prior instance)
2017-10-09 10:07:15.548737 7f7972bd6700  4 mds.6.purge_queue operator():  data pool 7 not found in OSDMap
2017-10-09 10:07:15.549521 7f7972bd6700  7 mds.6.server operator(): full = 0 epoch = 18492
2017-10-09 10:07:15.549534 7f7972bd6700  4 mds.6.96022 handle_osd_map epoch 18492, 0 new blacklist entries
2017-10-09 10:07:15.549537 7f7972bd6700 10 mds.6.server apply_blacklist: killed 0
2017-10-09 10:07:15.549582 7f796cbca700 10 MDSIOContextBase::complete: 12C_IO_Wrapper
2017-10-09 10:07:15.549679 7f796cbca700 10 MDSInternalContextBase::complete: 15C_MDS_BootStart
2017-10-09 10:07:15.549685 7f796cbca700  2 mds.6.96022 boot_start 0: opening inotable
2017-10-09 10:07:15.549695 7f796cbca700 10 mds.6.inotable: load
2017-10-09 10:07:15.549880 7f796cbca700  2 mds.6.96022 boot_start 0: opening sessionmap
2017-10-09 10:07:15.549888 7f796cbca700 10 mds.6.sessionmap load
2017-10-09 10:07:15.549977 7f796cbca700  2 mds.6.96022 boot_start 0: opening mds log
2017-10-09 10:07:15.549984 7f796cbca700  5 mds.6.log open discovering log bounds
2017-10-09 10:07:15.550113 7f796c3c9700  4 mds.6.journalpointer Reading journal pointer '406.00000000'
2017-10-09 10:07:15.550132 7f796bbc8700 10 mds.6.log _submit_thread start
2017-10-09 10:07:15.551165 7f796cbca700 10 MDSIOContextBase::complete: 12C_IO_MT_Load
2017-10-09 10:07:15.551178 7f796cbca700 10 mds.6.inotable: load_2 got 34 bytes
2017-10-09 10:07:15.551184 7f796cbca700 10 mds.6.inotable: load_2 loaded v1
2017-10-09 10:07:15.565382 7f796cbca700 10 MDSIOContextBase::complete: N12_GLOBAL__N_112C_IO_SM_LoadE
2017-10-09 10:07:15.565397 7f796cbca700 10 mds.6.sessionmap _load_finish loaded version 0
2017-10-09 10:07:15.565401 7f796cbca700 10 mds.6.sessionmap _load_finish: omap load complete
2017-10-09 10:07:15.565403 7f796cbca700 10 mds.6.sessionmap _load_finish: v 0, 0 sessions
2017-10-09 10:07:15.565408 7f796cbca700 10 mds.6.sessionmap dump
2017-10-09 10:07:15.583721 7f796c3c9700  1 mds.6.journaler.mdlog(ro) recover start
2017-10-09 10:07:15.583732 7f796c3c9700  1 mds.6.journaler.mdlog(ro) read_head
2017-10-09 10:07:15.583854 7f796c3c9700  4 mds.6.log Waiting for journal 0x206 to recover...
2017-10-09 10:07:15.796523 7f796cbca700  1 mds.6.journaler.mdlog(ro) _finish_read_head loghead(trim 25992101888, expire 25992101888, write 25992101888, stream_format 1).  probing for end of log (from 25992101888)...
2017-10-09 10:07:15.796545 7f796cbca700  1 mds.6.journaler.mdlog(ro) probing for end of the log
2017-10-09 10:07:15.797962 7f796cbca700  1 mds.6.journaler.mdlog(ro) _finish_probe_end write_pos = 25992101936 (header had 25992101888). recovered.
2017-10-09 10:07:15.798040 7f796c3c9700  4 mds.6.log Journal 0x206 recovered.
2017-10-09 10:07:15.798060 7f796c3c9700  4 mds.6.log Recovered journal 0x206 in format 1
2017-10-09 10:07:15.798067 7f796c3c9700 10 MDSInternalContextBase::complete: 15C_MDS_BootStart
2017-10-09 10:07:15.798069 7f796c3c9700  2 mds.6.96022 boot_start 1: loading/discovering base inodes
2017-10-09 10:07:15.798088 7f796c3c9700  0 mds.6.cache creating system inode with ino:0x106
2017-10-09 10:07:15.798131 7f796c3c9700 10 mds.6.cache.ino(0x106) fetch
2017-10-09 10:07:15.798365 7f796c3c9700  4 mds.6.purge_queue open: opening
2017-10-09 10:07:15.798372 7f796c3c9700  1 mds.6.journaler.pq(ro) recover start
2017-10-09 10:07:15.798375 7f796c3c9700  1 mds.6.journaler.pq(ro) read_head
2017-10-09 10:07:15.798459 7f796c3c9700  0 mds.6.cache creating system inode with ino:0x1
2017-10-09 10:07:15.799812 7f796dbcc700  1 mds.6.journaler.pq(ro) _finish_read_head loghead(trim 104857600, expire 108687220, write 108868115, stream_format 1).  probing for end of log (from 108868115)...
2017-10-09 10:07:15.799830 7f796dbcc700  1 mds.6.journaler.pq(ro) probing for end of the log
2017-10-09 10:07:15.799830 7f796cbca700 10 MDSIOContextBase::complete: 18C_IO_Inode_Fetched
2017-10-09 10:07:15.799839 7f796cbca700 10 mds.6.cache.ino(0x106) _fetched got 0 and 536
2017-10-09 10:07:15.799848 7f796cbca700 10 mds.6.cache.ino(0x106)  magic is 'ceph fs volume v011' (expecting 'ceph fs volume v011')
2017-10-09 10:07:15.799879 7f796cbca700 10  mds.6.cache.snaprealm(0x106 seq 1 0x561dd6029c00) open_parents [1,head]
2017-10-09 10:07:15.799884 7f796cbca700 20 mds.6.cache.ino(0x106) decode_snap_blob snaprealm(0x106 seq 1 lc 0 cr 0 cps 1 snaps={} 0x561dd6029c00)
2017-10-09 10:07:15.799890 7f796cbca700 10 mds.6.cache.ino(0x106) _fetched [inode 0x106 [...2,head] ~mds6/ auth v19 snaprealm=0x561dd6029c00 f(v0 10=0+10) n(v3 rc2017-10-03 22:56:32.400835 b6253 88=11+77)/n(v0 11=0+11) (iversion lock) 0x561dd623a700]
2017-10-09 10:07:15.809362 7f796dbcc700  1 mds.6.journaler.pq(ro) _finish_probe_end write_pos = 134217728 (header had 108868115). recovered.
2017-10-09 10:07:15.809384 7f796dbcc700  4 mds.6.purge_queue operator(): open complete
2017-10-09 10:07:15.809387 7f796dbcc700  4 mds.6.purge_queue operator(): recovering write_pos
2017-10-09 10:07:15.809407 7f796dbcc700 10 mds.6.journaler.pq(ro) _prefetch
2017-10-09 10:07:15.809409 7f796dbcc700 10 mds.6.journaler.pq(ro) _prefetch 41943040 requested_pos 108868115 < target 134217728 (150811155), prefetching 25349613
2017-10-09 10:07:15.809415 7f796dbcc700 10 mds.6.journaler.pq(ro) _issue_read reading 108868115~25349613, read pointers 108868115/108868115/134217728
2017-10-09 10:07:15.809741 7f796dbcc700 10 mds.6.journaler.pq(ro) wait_for_readable at 108868115 onreadable 0x561dd62d0840
2017-10-09 10:07:15.811541 7f796dbcc700 10 mds.6.journaler.pq(ro) _finish_read got 108868115~183789
2017-10-09 10:07:15.811561 7f796dbcc700 10 mds.6.journaler.pq(ro) _assimilate_prefetch 108868115~183789
2017-10-09 10:07:15.811565 7f796dbcc700 10 mds.6.journaler.pq(ro) _assimilate_prefetch read_buf now 108868115~183789, read pointers 108868115/109051904/134217728
2017-10-09 10:07:15.811624 7f796dbcc700 -1 mds.6.journaler.pq(ro) _decode error from assimilate_prefetch
2017-10-09 10:07:15.811636 7f796dbcc700 -1 mds.6.purge_queue _recover: Error -22 recovering write_pos
2017-10-09 10:07:15.811640 7f796dbcc700 10 mds.beacon.mds9 set_want_state: up:replay -> down:damaged
2017-10-09 10:07:15.811647 7f796dbcc700 20 mds.beacon.mds9 0 slow request found
2017-10-09 10:07:15.811677 7f796dbcc700 10 mds.beacon.mds9 _send down:damaged seq 7
2017-10-09 10:07:15.811724 7f796dbcc700 20 mds.beacon.mds9 send_and_wait: awaiting 7 for up to 5s
2017-10-09 10:07:15.828725 7f7972bd6700 10 mds.beacon.mds9 handle_mds_beacon down:damaged seq 7 rtt 0.017043
2017-10-09 10:07:15.828782 7f796dbcc700  1 mds.mds9 respawn
2017-10-09 10:07:15.828796 7f796dbcc700  1 mds.mds9  e: 'ceph-mds'
2017-10-09 10:07:15.828798 7f796dbcc700  1 mds.mds9  0: 'ceph-mds'
2017-10-09 10:07:15.828801 7f796dbcc700  1 mds.mds9  1: '-i'
2017-10-09 10:07:15.828802 7f796dbcc700  1 mds.mds9  2: 'mds9'
2017-10-09 10:07:15.828805 7f796dbcc700  1 mds.mds9  3: '-f'
2017-10-09 10:07:15.828806 7f796dbcc700  1 mds.mds9  4: '--debug-mds=100'
2017-10-09 10:07:15.828807 7f796dbcc700  1 mds.mds9  5: '--debug-journaler=100'
2017-10-09 10:07:15.828848 7f796dbcc700  1 mds.mds9 respawning with exe /usr/bin/ceph-mds
2017-10-09 10:07:15.828851 7f796dbcc700  1 mds.mds9  exe_path /proc/self/exe
2017-10-09 10:07:15.854091 7f629cc1b240  0 ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process (unknown), pid 653
2017-10-09 10:07:15.857514 7f629cc1b240  0 pidfile_write: ignore empty --pid-file
2017-10-09 10:07:15.859999 7f629cc1b240 10 mds.mds9 112 MDSCacheObject
2017-10-09 10:07:15.860005 7f629cc1b240 10 mds.mds9 1624        CInode
2017-10-09 10:07:15.860006 7f629cc1b240 10 mds.mds9 16   elist<>::item   *7=112
2017-10-09 10:07:15.860007 7f629cc1b240 10 mds.mds9 504  inode_t
2017-10-09 10:07:15.860008 7f629cc1b240 10 mds.mds9 48    nest_info_t
2017-10-09 10:07:15.860009 7f629cc1b240 10 mds.mds9 40    frag_info_t
2017-10-09 10:07:15.860011 7f629cc1b240 10 mds.mds9 40   SimpleLock   *5=200
2017-10-09 10:07:15.860013 7f629cc1b240 10 mds.mds9 48   ScatterLock  *3=144
2017-10-09 10:07:15.860014 7f629cc1b240 10 mds.mds9 456 CDentry
2017-10-09 10:07:15.860015 7f629cc1b240 10 mds.mds9 16   elist<>::item
2017-10-09 10:07:15.860016 7f629cc1b240 10 mds.mds9 40   SimpleLock
2017-10-09 10:07:15.860017 7f629cc1b240 10 mds.mds9 824 CDir
2017-10-09 10:07:15.860018 7f629cc1b240 10 mds.mds9 16   elist<>::item   *2=32
2017-10-09 10:07:15.860019 7f629cc1b240 10 mds.mds9 232  fnode_t
2017-10-09 10:07:15.860020 7f629cc1b240 10 mds.mds9 48    nest_info_t *2
2017-10-09 10:07:15.860021 7f629cc1b240 10 mds.mds9 40    frag_info_t *2
2017-10-09 10:07:15.860022 7f629cc1b240 10 mds.mds9 272 Capability
2017-10-09 10:07:15.860023 7f629cc1b240 10 mds.mds9 32   xlist<>::item   *2=64
2017-10-09 10:07:15.861045 7f629819a700 10 mds.mds9 MDSDaemon::ms_get_authorizer type=mon
2017-10-09 10:07:15.869017 7f629cc1b240 10 mds.beacon.mds9 _send up:boot seq 1
2017-10-09 10:07:15.869682 7f62959bd700  5 mds.mds9 handle_mds_map epoch 96022 from mon.0
2017-10-09 10:07:15.869722 7f62959bd700 10 mds.mds9      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}
2017-10-09 10:07:15.869733 7f62959bd700 10 mds.mds9  mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
2017-10-09 10:07:15.869739 7f62959bd700 10 mds.mds9 map says I am 147.87.226.189:6800/2626079655 mds.-1.-1 state ???
2017-10-09 10:07:15.869745 7f62959bd700 10 mds.mds9 handle_mds_map: handling map in rankless mode
2017-10-09 10:07:15.869747 7f62959bd700 10 mds.mds9 not in map yet
2017-10-09 10:07:15.870364 7f6297999700 10 mds.mds9 MDSDaemon::ms_get_authorizer type=mgr
2017-10-09 10:07:16.602117 7f62959bd700  5 mds.mds9 handle_mds_map epoch 96023 from mon.0
2017-10-09 10:07:16.602144 7f62959bd700 10 mds.mds9      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}
2017-10-09 10:07:16.602149 7f62959bd700 10 mds.mds9  mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
2017-10-09 10:07:16.602153 7f62959bd700 10 mds.mds9 map says I am 147.87.226.189:6800/2626079655 mds.-1.-1 state ???
2017-10-09 10:07:16.602158 7f62959bd700 10 mds.mds9 handle_mds_map: handling map in rankless mode
2017-10-09 10:07:16.602160 7f62959bd700 10 mds.mds9 not in map yet
2017-10-09 10:07:19.869141 7f62929b7700 10 mds.beacon.mds9 _send up:boot seq 2
2017-10-09 10:07:20.786598 7f62959bd700  5 mds.mds9 handle_mds_map epoch 96024 from mon.0
2017-10-09 10:07:20.786640 7f62959bd700 10 mds.mds9      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}
2017-10-09 10:07:20.786647 7f62959bd700 10 mds.mds9  mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
2017-10-09 10:07:20.786653 7f62959bd700 10 mds.mds9 map says I am 147.87.226.189:6800/2626079655 mds.-1.0 state up:standby
2017-10-09 10:07:20.786683 7f62959bd700 10 mds.mds9 handle_mds_map: handling map in rankless mode
2017-10-09 10:07:20.786700 7f62959bd700 10 mds.beacon.mds9 set_want_state: up:boot -> up:standby
2017-10-09 10:07:20.786703 7f62959bd700  1 mds.mds9 handle_mds_map standby
2017-10-09 10:07:20.787124 7f62959bd700 10 mds.beacon.mds9 handle_mds_beacon up:boot seq 2 rtt 0.917961
2017-10-09 10:07:23.869252 7f62929b7700 10 mds.beacon.mds9 _send up:standby seq 3
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux