Re: CEPHFS - MDS gracefull handover of rank 0

Stefan Kooman <stefan@xxxxxx> · Fri, 12 Feb 2021 11:24:32 +0100

On 1/27/21 9:08 AM, Martin Hronek wrote:

So before the next MDS the FS config where changed to one active and one 
standby-replay node, the idea was that since the MDS replay nodes 
follows the active one the handover would be smoother. The active state 
was reached faster, but we still noticed some hiccups on the clients 
while the new active MDS was waiting for clients to reconnect(state 
up:reconnect) after the failover.

The next idea was to do a manual node promotion, graceful shutdown or 
something similar - where the open caps and sessions would be handed 
over ... but I did not find any hint in the docs regarding this 
functionality.
But, this should somehow be possible (imho), since when adding a second 
active mds node (max_mds 2) and then removing it again (max_mds 1) the 
rank 1 node goes to stopping-state and hands over all clients/caps to 
rank 0 without interruptions for the clients.

Therefore my question: how can one gracefully shutdown an active rank 0 
mds node or promote an standby node to the active state without loosing 
open files/caps or client sessions?

The way to upgrade a cluster, and the current limitations of it, are 
described here [1]. Most relevant part for you in there:

Currently the MDS cluster does not have built-in versioning or file 
system flags to support seamless upgrades of the MDSs without 
potentially causing assertions or other faults due to incompatible 
messages or other functional differences. For this reason, it’s 
necessary during any cluster upgrade to reduce the number of active MDS 
for a file system to one first so that two active MDS do not communicate 
with different versions. Further, it’s also necessary to take standbys 
offline as any new CompatSet flags will propagate via the MDSMap to all 
MDS and cause older MDS to suicide.

So best practices are that you have only _1_ active, upgrade the 
software of the last running MDS and then restart the MDS.

It would be *really* nice if this could be fixed in a newer version of 
Ceph. Proably not trivial, but AFAIK the only part of Ceph that gives 
noticable impact during maintenance (like upgrades). If having this 
fixed is important for you, make sure you leave a note about this in the 
upcoming Ceph user survey.

Gr. Stefan

[1]: https://docs.ceph.com/en/latest/cephfs/upgrading/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx