Re: CEPHFS - MDS gracefull handover of rank 0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

In our experience failovers are largely transparent if the mds has:

    mds session blacklist on timeout = false
    mds session blacklist on evict = false

And clients have

    client reconnect stale = true

Cheers, Dan

On Wed, Jan 27, 2021 at 9:09 AM Martin Hronek
<martin.hronek@xxxxxxxxxxxxxx> wrote:
>
> Hello fellow CEPH-users,
> currently we are updating our CEPH(14.2.16) and making changes to some
> config settings.
>
> TLDR: is there a way to make a graceful MDS active node shutdown without
> loosing the caps, open files and client connections? Something like
> handover active state, promote standby to active, ...?
>
>
> Sadly we run into some difficulties when restarting MDS Nodes. While we
> had two active nodes and one standby we initially though that this would
> have a nice handover when restarting the active rank ... sadly we saw
> how the node was going through the states:
> replay-reconnect-rejoin-active as nicely visualized here
> https://docs.ceph.com/en/latest/cephfs/mds-states/
>
> This left some nodes going into timeouts until the standby node has gone
> into the active state again, most probably since the cephfs hast already
> some 600k folders and 3M files and from the client side it took more
> than 30s.
>
> So before the next MDS the FS config where changed to one active and one
> standby-replay node, the idea was that since the MDS replay nodes
> follows the active one the handover would be smoother. The active state
> was reached faster, but we still noticed some hiccups on the clients
> while the new active MDS was waiting for clients to reconnect(state
> up:reconnect) after the failover.
>
> The next idea was to do a manual node promotion, graceful shutdown or
> something similar - where the open caps and sessions would be handed
> over ... but I did not find any hint in the docs regarding this
> functionality.
> But, this should somehow be possible (imho), since when adding a second
> active mds node (max_mds 2) and then removing it again (max_mds 1) the
> rank 1 node goes to stopping-state and hands over all clients/caps to
> rank 0 without interruptions for the clients.
>
> Therefore my question: how can one gracefully shutdown an active rank 0
> mds node or promote an standby node to the active state without loosing
> open files/caps or client sessions?
>
> Thanks in advance,
> M
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux