Re: [ Ceph MDS MON Config Variables ] Failover Delay issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Also there's a difference between 'standby-replay' (hot standby) and just 'standby'. We use CephFS for a couple of years now with standby-replay and the failover takes a couple of seconds max, depending on the current load. Have you tried to enable the standby-replay config and tested the failover?

ceph fs set cephfs allow_standby_replay true


Zitat von Olivier AUDRY <oaudry@xxxxxxxxxxx>:

hello

perhaps you should have more than one MDS active.

mds: cephfs:3 {0=cephfs-d=up:active,1=cephfs-e=up:active,2=cephfs-
a=up:active} 1 up:standby-replay

I got 3 active mds and one standby.

I'm using rook in kubernetes for this setup.

oau

Le lundi 03 mai 2021 à 19:06 +0530, Lokendra Rathour a écrit :
Hi Team,
I was setting up the ceph cluster with

   - Node Details:3 Mon,2 MDS, 2 Mgr, 2 RGW
   - Deployment Type: Active Standby
   - Testing Mode: Failover of MDS Node
   - Setup : Octopus (15.2.7)
   - OS: centos 8.3
   - hardware: HP
   - Ram:  128 GB on each Node
   - OSD: 2 ( 1 tb each)
   - Operation: Normal I/O with mkdir on every 1 second.

T*est Case: Power-off any active MDS Node for failover to happen*

*Observation:*
We have observed that whenever an active MDS Node is down it takes
around*
40 seconds* to activate the standby MDS Node.
on further checking the logs for the new-handover MDS Node we have
seen
delay on the basis of following inputs:

   1. 10 second delay after which Mon calls for new Monitor election
      1.  [log]  0 log_channel(cluster) log [INF] : mon.cephnode1
calling
      monitor election
   2. 5 second delay in which newly elected Monitor is elected
      1. [log] 0 log_channel(cluster) log [INF] : mon.cephnode1 is
new
      leader, mons cephnode1,cephnode3 in quorum (ranks 0,2)
      3. the addition beacon grace time for which the system waits
before
   which it enables standby MDS node activation. (approx delay of 19
seconds)
      1. defaults :  sudo ceph config get mon mds_beacon_grace
      15.000000
      2. sudo ceph config get mon mds_beacon_interval
      5.000000
      3. [log] - 2021-04-30T18:23:10.136+0530 7f4e3925c700  1
      mon.cephnode2@1(leader).mds e776 no beacon from mds.0.771 (gid:
      639443 addr: [v2:
      10.0.4.10:6800/2172152716,v1:10.0.4.10:6801/2172152716] state:
      up:active)* since 18.7951*
   4. *in Total it takes around 40 seconds to handover and activate
passive
   standby node. *

*Query:*

   1. Can these variables be configured ?  which we have tried,but
are not
   aware of the overall impact on the ceph cluster because of these
changes
      1. By tuning these values we could reach the minimum time of 12
      seconds in which the active node comes up.
      2. Values taken to get the said time :
         1. *mon_election_timeout* (default 5) - configured as 1
         2. *mon_lease*(default 5)  - configured as 2
         3.  *mds_beacon_grace* (default 15) - configured as 5
         4.  *mds_beacon_interval* (default 5) - configured as 1

We need to tune this setup to get the failover duration as low as 5-7
seconds.

Please suggest/support and share your inputs, my setup is ready and
already
we are testing with multiple scenarios so that we are able to achive
min
failover duration.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux