Hello, I am interested in the best-practice guidance for the following situation. There is a Ceph cluster with CephFS deployed. There are three servers dedicated to running MDS daemons: one active, one standby-replay, and one standby. There is only a single rank. Sometimes, servers need to be rebooted for reasons unrelated to Ceph. What's the proper procedure to follow when restarting a node that currently contains an active MDS server? The goal is to minimize the client downtime. Ideally, they should not notice even if they play MP3s from the CephFS filesystem (note that I haven't tested this exact scenario) - is this achievable? I tried to use the "ceph mds fail mds02" command while mds02 was active and mds03 was standby-replay, to force the fail-over to mds03 so that I could reboot mds02. Result: mds02 became standby, while mds03 went through reconnect (30 seconds), rejoin (another 30 seconds), and replay (5 seconds) phases. During the "reconnect" and "rejoin" phases, the "Activity" column of "ceph fs status" is empty, which concerns me. It looks like I just caused a 65-second downtime. After all of that, mds02 became standby-replay, as expected. Is there a better way? Or, should I have rebooted mds02 without much thinking? -- Alexander E. Patrakov _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx