MDS failover very slow the first time, but very fast at second time

Ch Wan <xmu.wc.2007@xxxxxxxxx> · Tue, 18 Dec 2018 11:48:31 +0800

Hi all, I have a ceph cluster running luminous 12.2.5.
In the cluster, we configured the cephfs with two MDS server, ceph-mds-test04 is active and ceph-mds-test05 is standby.
Here is the MDS configuration:
[mds]
mds_cache_size = 1000000
mds_cache_memory_limit = 42949672960
mds_standby_replay = true
mds_beacon_grace = 300

 I created 100 million files and want to figure out that how long will it take to do a failover at such scale.
At the first time, it failed to failover to ceph-mds-test05 because of timeout
ceph mds fail ceph-mds-test04

Here is the log
2018-12-18 10:58:38.164369 7fd696a9d700  1 mds.0.6382 handle_mds_map i am now mds.0.6382
2018-12-18 10:58:38.164374 7fd696a9d700  1 mds.0.6382 handle_mds_map state change up:reconnect --> up:rejoin
2018-12-18 10:58:38.164394 7fd696a9d700  1 mds.0.6382 rejoin_start 
2018-12-18 11:03:40.583521 7fd697a9f700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 300
2018-12-18 11:03:41.490589 7fd693a97700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 300
2018-12-18 11:03:45.490645 7fd693a97700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 300
2018-12-18 11:03:45.583601 7fd697a9f700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 300
2018-12-18 11:03:49.490687 7fd693a97700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 300
2018-12-18 11:03:50.583665 7fd697a9f700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 300
2018-12-18 11:03:53.490744 7fd693a97700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 300
2018-12-18 11:03:54.996326 7fd696a9d700  1 mds.0.6382 rejoin_joint_start
2018-12-18 11:03:55.001907 7fd694298700  1 heartbeat_map reset_timeout 'MDSRank' had timed out after 300 
2018-12-18 11:03:55.002064 7fd696a9d700  0 mds.beacon.zw01-data-hadoop-ceph-test05 handle_mds_beacon no longer laggy 
2018-12-18 11:03:56.767320 7fd69aa15700  0 -- 10.130.212.14:6800/543171573 >> 10.130.213.8:0/2865474356 conn(0x7fd6b2680000 :6800 s=STATE_OPEN pgs=13123 cs=1 l=0).fault server, going to standby 
2018-12-18 11:03:56.798865 7fd696a9d700  1 mds.zw01-data-hadoop-ceph-test05 map removed me (mds.-1 gid:824096) from cluster due to lost contact; respawning 
2018-12-18 11:03:56.798874 7fd696a9d700  1 mds.zw01-data-hadoop-ceph-test05 respawn

But when I trigger failover the second time, it finished immediately
2018-12-18 11:11:37.704956 7f58061aa700  1 mds.0.6394 reconnect_done
2018-12-18 11:11:38.402853 7f58061aa700  1 mds.0.6394 handle_mds_map i am now mds.0.6394
2018-12-18 11:11:38.402856 7f58061aa700  1 mds.0.6394 handle_mds_map state change up:reconnect --> up:rejoin
2018-12-18 11:11:38.402860 7f58061aa700  1 mds.0.6394 rejoin_start
2018-12-18 11:11:38.405550 7f58061aa700  1 mds.0.6394 rejoin_joint_start
2018-12-18 11:11:38.430299 7f58061aa700  1 mds.0.6394 rejoin_done
2018-12-18 11:11:39.486981 7f58061aa700  1 mds.0.6394 handle_mds_map i am now mds.0.6394
2018-12-18 11:11:39.486984 7f58061aa700  1 mds.0.6394 handle_mds_map state change up:rejoin --> up:active
2018-12-18 11:11:39.486990 7f58061aa700  1 mds.0.6394 recovery_done -- successful recovery!
2018-12-18 11:11:39.487131 7f58061aa700  1 mds.0.6394 active_start
2018-12-18 11:11:39.496333 7f58061aa700  1 mds.0.6394 cluster recovered.

Would someone explain it? Thanks a lot! 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com