I concur, having this heavily collocated set-up will not perform any better than you observe. Do you really have 2 MDS daemons per host? I just saw hat you have only 2 disks, probably 1 per node. In this set-up, you cannot really expect good fail-over times due to the amount of simultaneous fails that need to be handled: - MON fail - OSD fail - MDS fail - MGR fail - (all!) PGs become degraded - FS data and meta data are on the same disks, so replaying journals, data IO and meta data IO all go to the same drive(s) There are plenty of bottle-necks in this set-up that render this test highly unrealistic. You should try to get more production ready hardware, a 2-disk ceph cluster isn't. I wouldn't waste time trying to adjust configs to a small test case, these config changes will not do any good for proper production systems. The set-up you have is good for learning to administrate ceph, it is not providing a point for comparison with a production system and will have heavily degraded performance. Ceph requires a not exactly small minimum size before it starts working well. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Eugen Block <eblock@xxxxxx> Sent: 03 May 2021 20:53:51 To: ceph-users@xxxxxxx Subject: Re: [ Ceph MDS MON Config Variables ] Failover Delay issue I wouldn't recommend a colocated MDS in a production environment. Zitat von Lokendra Rathour <lokendrarathour@xxxxxxxxx>: > Hello Frank, > Thanks for your inputs. > > *Responding to your Queries , Kindly refer below:* > > - *Do you have services co-located? * > - [loke] : Yes they are colocated: > - Cephnode1 : MDS,MGR,MON,RGW,OSD,MDS > - Cephnode2: MDS,MGR,MON,RGW,OSD,MDS > - Cephnode3: MON > - Which of the times (1) or (2) are you referring to? > - For you part One (1) : we can say it like by counting the time since > the I/O is stopped till the I/O is resumed, which includes > > - call for new mon election > - election of mon leader > - calling of MDS standby acting by new Mon Leader > - resuming I/O stuck threads > - and other internal process.(i am only point what i could read > from logs) > > - *How many FS Clients do you have?* > - we are testing with only one client mounting using native fs driver > at the moment, where we pass both IP Address of both the MDS > Daemon(in our > case both the Ceph Nodes) using following method: > - sudo mount -t ceph 10.0.4.10,10.0.4.11:6789:/volumes/path/ > /mnt/cephconf -o > name=foo,secret=AQAus49gdCHvIxAAB89BcDYqYSqJ8yOJBg5grw== > > > *one input*:if we only shut-down MDS Active Daemon, we only get 4-7 > Seconds, i.e if we are not rebooting the physical node but only the service > MDS. > When we reboot Physical node , Cephnode1 or Cephnode2 ( Mon,Mgr,RGW,OSD > also gets rebooted along with MDS) we realizing around 40 seconds. > > Best Regards, > Lokendra > > > On Mon, May 3, 2021 at 10:30 PM Frank Schilder <frans@xxxxxx> wrote: > >> Following up on this and other comments, there are 2 different time >> delays. One (1) is the time it takes from killing an MDS until a stand-by >> is made an active rank, and (2) the time it takes for the new active rank >> to restore all client sessions. My experience is that (1) takes close to 0 >> seconds while (2) can take between 20-30 seconds depending on how busy the >> clients are; the MDS will go through various states before reaching active. >> We usually have ca. 1600 client connections to our FS. With fewer clients, >> MDS fail-over is practically instantaneous. We are using latest mimic. >> >> From what you write, you seem to have a 40 seconds window for (1), which >> points to a problem different to MON config values. This is supported by >> your description including a MON election (??? this should never happen). >> Do you have have services co-located? Which of the times (1) or (2) are you >> referring to? How many FS clients do you have? >> >> Best regards, >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> >> ________________________________________ >> From: Patrick Donnelly <pdonnell@xxxxxxxxxx> >> Sent: 03 May 2021 17:19:37 >> To: Lokendra Rathour >> Cc: Ceph Development; dev; ceph-users >> Subject: Re: [ Ceph MDS MON Config Variables ] Failover Delay >> issue >> >> On Mon, May 3, 2021 at 6:36 AM Lokendra Rathour >> <lokendrarathour@xxxxxxxxx> wrote: >> > >> > Hi Team, >> > I was setting up the ceph cluster with >> > >> > - Node Details:3 Mon,2 MDS, 2 Mgr, 2 RGW >> > - Deployment Type: Active Standby >> > - Testing Mode: Failover of MDS Node >> > - Setup : Octopus (15.2.7) >> > - OS: centos 8.3 >> > - hardware: HP >> > - Ram: 128 GB on each Node >> > - OSD: 2 ( 1 tb each) >> > - Operation: Normal I/O with mkdir on every 1 second. >> > >> > T*est Case: Power-off any active MDS Node for failover to happen* >> > >> > *Observation:* >> > We have observed that whenever an active MDS Node is down it takes >> around* >> > 40 seconds* to activate the standby MDS Node. >> > on further checking the logs for the new-handover MDS Node we have seen >> > delay on the basis of following inputs: >> > >> > 1. 10 second delay after which Mon calls for new Monitor election >> > 1. [log] 0 log_channel(cluster) log [INF] : mon.cephnode1 calling >> > monitor election >> >> In the process of killing the active MDS, are you also killing a monitor? >> >> -- >> Patrick Donnelly, Ph.D. >> He / Him / His >> Principal Software Engineer >> Red Hat Sunnyvale, CA >> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > > > -- > ~ Lokendra > www.inertiaspeaks.com > www.inertiagroups.com > skype: lokendrarathour > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx