On 10/09/2017 09:17 AM, Daniel Baumann wrote: > The relevant portion from the ceph-mds log (when starting mds9 which > should then take up rank 6; I'm happy to provide any logs): i've turned up the logging (see attachment).. could it be that we hit this bug here? http://tracker.ceph.com/issues/17670 Regards, Daniel
2017-10-09 10:07:14.677308 7f7972bd6700 10 mds.beacon.mds9 handle_mds_beacon up:standby seq 6 rtt 0.000642 2017-10-09 10:07:15.547453 7f7972bd6700 5 mds.mds9 handle_mds_map epoch 96022 from mon.0 2017-10-09 10:07:15.547526 7f7972bd6700 10 mds.mds9 my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in om ap,7=mds uses inline data,8=file layout v2} 2017-10-09 10:07:15.547546 7f7972bd6700 10 mds.mds9 mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in om ap,8=file layout v2} 2017-10-09 10:07:15.547555 7f7972bd6700 10 mds.mds9 map says I am 147.87.226.189:6800/6621615 mds.6.96022 state up:replay 2017-10-09 10:07:15.547825 7f7972bd6700 4 mds.6.purge_queue operator(): data pool 7 not found in OSDMap 2017-10-09 10:07:15.547882 7f7972bd6700 10 mds.mds9 handle_mds_map: initializing MDS rank 6 2017-10-09 10:07:15.548165 7f7972bd6700 10 mds.6.0 update_log_config log_to_monitors {default=true} 2017-10-09 10:07:15.548171 7f7972bd6700 10 mds.6.0 create_logger 2017-10-09 10:07:15.548410 7f7972bd6700 7 mds.6.server operator(): full = 0 epoch = 0 2017-10-09 10:07:15.548423 7f7972bd6700 4 mds.6.purge_queue operator(): data pool 7 not found in OSDMap 2017-10-09 10:07:15.548427 7f7972bd6700 4 mds.6.0 handle_osd_map epoch 0, 0 new blacklist entries 2017-10-09 10:07:15.548439 7f7972bd6700 10 mds.6.server apply_blacklist: killed 0 2017-10-09 10:07:15.548634 7f7972bd6700 10 mds.mds9 handle_mds_map: handling map as rank 6 2017-10-09 10:07:15.548647 7f7972bd6700 1 mds.6.96022 handle_mds_map i am now mds.6.96022 2017-10-09 10:07:15.548650 7f7972bd6700 1 mds.6.96022 handle_mds_map state change up:boot --> up:replay 2017-10-09 10:07:15.548668 7f7972bd6700 10 mds.beacon.mds9 set_want_state: up:standby -> up:replay 2017-10-09 10:07:15.548687 7f7972bd6700 1 mds.6.96022 replay_start 2017-10-09 10:07:15.548699 7f7972bd6700 7 mds.6.cache set_recovery_set 0,1,2,3,4,5,7,8 2017-10-09 10:07:15.548706 7f7972bd6700 1 mds.6.96022 recovery set is 0,1,2,3,4,5,7,8 2017-10-09 10:07:15.548720 7f7972bd6700 1 mds.6.96022 waiting for osdmap 18484 (which blacklists prior instance) 2017-10-09 10:07:15.548737 7f7972bd6700 4 mds.6.purge_queue operator(): data pool 7 not found in OSDMap 2017-10-09 10:07:15.549521 7f7972bd6700 7 mds.6.server operator(): full = 0 epoch = 18492 2017-10-09 10:07:15.549534 7f7972bd6700 4 mds.6.96022 handle_osd_map epoch 18492, 0 new blacklist entries 2017-10-09 10:07:15.549537 7f7972bd6700 10 mds.6.server apply_blacklist: killed 0 2017-10-09 10:07:15.549582 7f796cbca700 10 MDSIOContextBase::complete: 12C_IO_Wrapper 2017-10-09 10:07:15.549679 7f796cbca700 10 MDSInternalContextBase::complete: 15C_MDS_BootStart 2017-10-09 10:07:15.549685 7f796cbca700 2 mds.6.96022 boot_start 0: opening inotable 2017-10-09 10:07:15.549695 7f796cbca700 10 mds.6.inotable: load 2017-10-09 10:07:15.549880 7f796cbca700 2 mds.6.96022 boot_start 0: opening sessionmap 2017-10-09 10:07:15.549888 7f796cbca700 10 mds.6.sessionmap load 2017-10-09 10:07:15.549977 7f796cbca700 2 mds.6.96022 boot_start 0: opening mds log 2017-10-09 10:07:15.549984 7f796cbca700 5 mds.6.log open discovering log bounds 2017-10-09 10:07:15.550113 7f796c3c9700 4 mds.6.journalpointer Reading journal pointer '406.00000000' 2017-10-09 10:07:15.550132 7f796bbc8700 10 mds.6.log _submit_thread start 2017-10-09 10:07:15.551165 7f796cbca700 10 MDSIOContextBase::complete: 12C_IO_MT_Load 2017-10-09 10:07:15.551178 7f796cbca700 10 mds.6.inotable: load_2 got 34 bytes 2017-10-09 10:07:15.551184 7f796cbca700 10 mds.6.inotable: load_2 loaded v1 2017-10-09 10:07:15.565382 7f796cbca700 10 MDSIOContextBase::complete: N12_GLOBAL__N_112C_IO_SM_LoadE 2017-10-09 10:07:15.565397 7f796cbca700 10 mds.6.sessionmap _load_finish loaded version 0 2017-10-09 10:07:15.565401 7f796cbca700 10 mds.6.sessionmap _load_finish: omap load complete 2017-10-09 10:07:15.565403 7f796cbca700 10 mds.6.sessionmap _load_finish: v 0, 0 sessions 2017-10-09 10:07:15.565408 7f796cbca700 10 mds.6.sessionmap dump 2017-10-09 10:07:15.583721 7f796c3c9700 1 mds.6.journaler.mdlog(ro) recover start 2017-10-09 10:07:15.583732 7f796c3c9700 1 mds.6.journaler.mdlog(ro) read_head 2017-10-09 10:07:15.583854 7f796c3c9700 4 mds.6.log Waiting for journal 0x206 to recover... 2017-10-09 10:07:15.796523 7f796cbca700 1 mds.6.journaler.mdlog(ro) _finish_read_head loghead(trim 25992101888, expire 25992101888, write 25992101888, stream_format 1). probing for end of log (from 25992101888)... 2017-10-09 10:07:15.796545 7f796cbca700 1 mds.6.journaler.mdlog(ro) probing for end of the log 2017-10-09 10:07:15.797962 7f796cbca700 1 mds.6.journaler.mdlog(ro) _finish_probe_end write_pos = 25992101936 (header had 25992101888). recovered. 2017-10-09 10:07:15.798040 7f796c3c9700 4 mds.6.log Journal 0x206 recovered. 2017-10-09 10:07:15.798060 7f796c3c9700 4 mds.6.log Recovered journal 0x206 in format 1 2017-10-09 10:07:15.798067 7f796c3c9700 10 MDSInternalContextBase::complete: 15C_MDS_BootStart 2017-10-09 10:07:15.798069 7f796c3c9700 2 mds.6.96022 boot_start 1: loading/discovering base inodes 2017-10-09 10:07:15.798088 7f796c3c9700 0 mds.6.cache creating system inode with ino:0x106 2017-10-09 10:07:15.798131 7f796c3c9700 10 mds.6.cache.ino(0x106) fetch 2017-10-09 10:07:15.798365 7f796c3c9700 4 mds.6.purge_queue open: opening 2017-10-09 10:07:15.798372 7f796c3c9700 1 mds.6.journaler.pq(ro) recover start 2017-10-09 10:07:15.798375 7f796c3c9700 1 mds.6.journaler.pq(ro) read_head 2017-10-09 10:07:15.798459 7f796c3c9700 0 mds.6.cache creating system inode with ino:0x1 2017-10-09 10:07:15.799812 7f796dbcc700 1 mds.6.journaler.pq(ro) _finish_read_head loghead(trim 104857600, expire 108687220, write 108868115, stream_format 1). probing for end of log (from 108868115)... 2017-10-09 10:07:15.799830 7f796dbcc700 1 mds.6.journaler.pq(ro) probing for end of the log 2017-10-09 10:07:15.799830 7f796cbca700 10 MDSIOContextBase::complete: 18C_IO_Inode_Fetched 2017-10-09 10:07:15.799839 7f796cbca700 10 mds.6.cache.ino(0x106) _fetched got 0 and 536 2017-10-09 10:07:15.799848 7f796cbca700 10 mds.6.cache.ino(0x106) magic is 'ceph fs volume v011' (expecting 'ceph fs volume v011') 2017-10-09 10:07:15.799879 7f796cbca700 10 mds.6.cache.snaprealm(0x106 seq 1 0x561dd6029c00) open_parents [1,head] 2017-10-09 10:07:15.799884 7f796cbca700 20 mds.6.cache.ino(0x106) decode_snap_blob snaprealm(0x106 seq 1 lc 0 cr 0 cps 1 snaps={} 0x561dd6029c00) 2017-10-09 10:07:15.799890 7f796cbca700 10 mds.6.cache.ino(0x106) _fetched [inode 0x106 [...2,head] ~mds6/ auth v19 snaprealm=0x561dd6029c00 f(v0 10=0+10) n(v3 rc2017-10-03 22:56:32.400835 b6253 88=11+77)/n(v0 11=0+11) (iversion lock) 0x561dd623a700] 2017-10-09 10:07:15.809362 7f796dbcc700 1 mds.6.journaler.pq(ro) _finish_probe_end write_pos = 134217728 (header had 108868115). recovered. 2017-10-09 10:07:15.809384 7f796dbcc700 4 mds.6.purge_queue operator(): open complete 2017-10-09 10:07:15.809387 7f796dbcc700 4 mds.6.purge_queue operator(): recovering write_pos 2017-10-09 10:07:15.809407 7f796dbcc700 10 mds.6.journaler.pq(ro) _prefetch 2017-10-09 10:07:15.809409 7f796dbcc700 10 mds.6.journaler.pq(ro) _prefetch 41943040 requested_pos 108868115 < target 134217728 (150811155), prefetching 25349613 2017-10-09 10:07:15.809415 7f796dbcc700 10 mds.6.journaler.pq(ro) _issue_read reading 108868115~25349613, read pointers 108868115/108868115/134217728 2017-10-09 10:07:15.809741 7f796dbcc700 10 mds.6.journaler.pq(ro) wait_for_readable at 108868115 onreadable 0x561dd62d0840 2017-10-09 10:07:15.811541 7f796dbcc700 10 mds.6.journaler.pq(ro) _finish_read got 108868115~183789 2017-10-09 10:07:15.811561 7f796dbcc700 10 mds.6.journaler.pq(ro) _assimilate_prefetch 108868115~183789 2017-10-09 10:07:15.811565 7f796dbcc700 10 mds.6.journaler.pq(ro) _assimilate_prefetch read_buf now 108868115~183789, read pointers 108868115/109051904/134217728 2017-10-09 10:07:15.811624 7f796dbcc700 -1 mds.6.journaler.pq(ro) _decode error from assimilate_prefetch 2017-10-09 10:07:15.811636 7f796dbcc700 -1 mds.6.purge_queue _recover: Error -22 recovering write_pos 2017-10-09 10:07:15.811640 7f796dbcc700 10 mds.beacon.mds9 set_want_state: up:replay -> down:damaged 2017-10-09 10:07:15.811647 7f796dbcc700 20 mds.beacon.mds9 0 slow request found 2017-10-09 10:07:15.811677 7f796dbcc700 10 mds.beacon.mds9 _send down:damaged seq 7 2017-10-09 10:07:15.811724 7f796dbcc700 20 mds.beacon.mds9 send_and_wait: awaiting 7 for up to 5s 2017-10-09 10:07:15.828725 7f7972bd6700 10 mds.beacon.mds9 handle_mds_beacon down:damaged seq 7 rtt 0.017043 2017-10-09 10:07:15.828782 7f796dbcc700 1 mds.mds9 respawn 2017-10-09 10:07:15.828796 7f796dbcc700 1 mds.mds9 e: 'ceph-mds' 2017-10-09 10:07:15.828798 7f796dbcc700 1 mds.mds9 0: 'ceph-mds' 2017-10-09 10:07:15.828801 7f796dbcc700 1 mds.mds9 1: '-i' 2017-10-09 10:07:15.828802 7f796dbcc700 1 mds.mds9 2: 'mds9' 2017-10-09 10:07:15.828805 7f796dbcc700 1 mds.mds9 3: '-f' 2017-10-09 10:07:15.828806 7f796dbcc700 1 mds.mds9 4: '--debug-mds=100' 2017-10-09 10:07:15.828807 7f796dbcc700 1 mds.mds9 5: '--debug-journaler=100' 2017-10-09 10:07:15.828848 7f796dbcc700 1 mds.mds9 respawning with exe /usr/bin/ceph-mds 2017-10-09 10:07:15.828851 7f796dbcc700 1 mds.mds9 exe_path /proc/self/exe 2017-10-09 10:07:15.854091 7f629cc1b240 0 ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process (unknown), pid 653 2017-10-09 10:07:15.857514 7f629cc1b240 0 pidfile_write: ignore empty --pid-file 2017-10-09 10:07:15.859999 7f629cc1b240 10 mds.mds9 112 MDSCacheObject 2017-10-09 10:07:15.860005 7f629cc1b240 10 mds.mds9 1624 CInode 2017-10-09 10:07:15.860006 7f629cc1b240 10 mds.mds9 16 elist<>::item *7=112 2017-10-09 10:07:15.860007 7f629cc1b240 10 mds.mds9 504 inode_t 2017-10-09 10:07:15.860008 7f629cc1b240 10 mds.mds9 48 nest_info_t 2017-10-09 10:07:15.860009 7f629cc1b240 10 mds.mds9 40 frag_info_t 2017-10-09 10:07:15.860011 7f629cc1b240 10 mds.mds9 40 SimpleLock *5=200 2017-10-09 10:07:15.860013 7f629cc1b240 10 mds.mds9 48 ScatterLock *3=144 2017-10-09 10:07:15.860014 7f629cc1b240 10 mds.mds9 456 CDentry 2017-10-09 10:07:15.860015 7f629cc1b240 10 mds.mds9 16 elist<>::item 2017-10-09 10:07:15.860016 7f629cc1b240 10 mds.mds9 40 SimpleLock 2017-10-09 10:07:15.860017 7f629cc1b240 10 mds.mds9 824 CDir 2017-10-09 10:07:15.860018 7f629cc1b240 10 mds.mds9 16 elist<>::item *2=32 2017-10-09 10:07:15.860019 7f629cc1b240 10 mds.mds9 232 fnode_t 2017-10-09 10:07:15.860020 7f629cc1b240 10 mds.mds9 48 nest_info_t *2 2017-10-09 10:07:15.860021 7f629cc1b240 10 mds.mds9 40 frag_info_t *2 2017-10-09 10:07:15.860022 7f629cc1b240 10 mds.mds9 272 Capability 2017-10-09 10:07:15.860023 7f629cc1b240 10 mds.mds9 32 xlist<>::item *2=64 2017-10-09 10:07:15.861045 7f629819a700 10 mds.mds9 MDSDaemon::ms_get_authorizer type=mon 2017-10-09 10:07:15.869017 7f629cc1b240 10 mds.beacon.mds9 _send up:boot seq 1 2017-10-09 10:07:15.869682 7f62959bd700 5 mds.mds9 handle_mds_map epoch 96022 from mon.0 2017-10-09 10:07:15.869722 7f62959bd700 10 mds.mds9 my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2} 2017-10-09 10:07:15.869733 7f62959bd700 10 mds.mds9 mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2} 2017-10-09 10:07:15.869739 7f62959bd700 10 mds.mds9 map says I am 147.87.226.189:6800/2626079655 mds.-1.-1 state ??? 2017-10-09 10:07:15.869745 7f62959bd700 10 mds.mds9 handle_mds_map: handling map in rankless mode 2017-10-09 10:07:15.869747 7f62959bd700 10 mds.mds9 not in map yet 2017-10-09 10:07:15.870364 7f6297999700 10 mds.mds9 MDSDaemon::ms_get_authorizer type=mgr 2017-10-09 10:07:16.602117 7f62959bd700 5 mds.mds9 handle_mds_map epoch 96023 from mon.0 2017-10-09 10:07:16.602144 7f62959bd700 10 mds.mds9 my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2} 2017-10-09 10:07:16.602149 7f62959bd700 10 mds.mds9 mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2} 2017-10-09 10:07:16.602153 7f62959bd700 10 mds.mds9 map says I am 147.87.226.189:6800/2626079655 mds.-1.-1 state ??? 2017-10-09 10:07:16.602158 7f62959bd700 10 mds.mds9 handle_mds_map: handling map in rankless mode 2017-10-09 10:07:16.602160 7f62959bd700 10 mds.mds9 not in map yet 2017-10-09 10:07:19.869141 7f62929b7700 10 mds.beacon.mds9 _send up:boot seq 2 2017-10-09 10:07:20.786598 7f62959bd700 5 mds.mds9 handle_mds_map epoch 96024 from mon.0 2017-10-09 10:07:20.786640 7f62959bd700 10 mds.mds9 my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2} 2017-10-09 10:07:20.786647 7f62959bd700 10 mds.mds9 mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2} 2017-10-09 10:07:20.786653 7f62959bd700 10 mds.mds9 map says I am 147.87.226.189:6800/2626079655 mds.-1.0 state up:standby 2017-10-09 10:07:20.786683 7f62959bd700 10 mds.mds9 handle_mds_map: handling map in rankless mode 2017-10-09 10:07:20.786700 7f62959bd700 10 mds.beacon.mds9 set_want_state: up:boot -> up:standby 2017-10-09 10:07:20.786703 7f62959bd700 1 mds.mds9 handle_mds_map standby 2017-10-09 10:07:20.787124 7f62959bd700 10 mds.beacon.mds9 handle_mds_beacon up:boot seq 2 rtt 0.917961 2017-10-09 10:07:23.869252 7f62929b7700 10 mds.beacon.mds9 _send up:standby seq 3
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com