Re: [ Ceph MDS MON Config Variables ] Failover Delay issue

Frank Schilder <frans@xxxxxx> · Mon, 3 May 2021 19:19:03 +0000

I concur, having this heavily collocated set-up will not perform any better than you observe. Do you really have 2 MDS daemons per host? I just saw hat you have only 2 disks, probably 1 per node. In this set-up, you cannot really expect good fail-over times due to the amount of simultaneous fails that need to be handled:

- MON fail
- OSD fail
- MDS fail
- MGR fail
- (all!) PGs become degraded
- FS data and meta data are on the same disks, so replaying journals, data IO and meta data IO all go to the same drive(s)

There are plenty of bottle-necks in this set-up that render this test highly unrealistic. You should try to get more production ready hardware, a 2-disk ceph cluster isn't. I wouldn't waste time trying to adjust configs to a small test case, these config changes will not do any good for proper production systems. The set-up you have is good for learning to administrate ceph, it is not providing a point for comparison with a production system and will have heavily degraded performance. Ceph requires a not exactly small minimum size before it starts working well.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: 03 May 2021 20:53:51
To: ceph-users@xxxxxxx
Subject:  Re: [ Ceph MDS MON Config Variables ] Failover Delay issue

I wouldn't recommend a colocated MDS in a production environment.

Zitat von Lokendra Rathour <lokendrarathour@xxxxxxxxx>:

> Hello Frank,
> Thanks for your inputs.
>
> *Responding to your Queries , Kindly refer below:*
>
>    - *Do you have services co-located? *
>    - [loke] : Yes they are colocated:
>          - Cephnode1 : MDS,MGR,MON,RGW,OSD,MDS
>          - Cephnode2: MDS,MGR,MON,RGW,OSD,MDS
>          - Cephnode3: MON
>       - Which of the times (1) or (2) are you referring to?
>    - For you part One (1) : we can say it like by counting the time since
>       the I/O is stopped till the I/O is resumed, which includes
>
>       - call for new mon election
>          - election of mon leader
>          - calling of MDS standby acting by new Mon Leader
>          - resuming I/O stuck threads
>          - and other internal process.(i am only point what i could read
>          from logs)
>
>       - *How many FS Clients do you have?*
>       - we are testing with only one client mounting using native fs driver
>       at the moment, where we pass both IP Address of both the MDS
> Daemon(in our
>       case both the Ceph Nodes) using following method:
>          - sudo mount -t ceph 10.0.4.10,10.0.4.11:6789:/volumes/path/
>          /mnt/cephconf -o
> name=foo,secret=AQAus49gdCHvIxAAB89BcDYqYSqJ8yOJBg5grw==
>
>
> *one input*:if we only shut-down MDS Active Daemon, we only get 4-7
> Seconds, i.e if we are not rebooting the physical node but only the service
> MDS.
> When we reboot Physical node , Cephnode1 or Cephnode2 ( Mon,Mgr,RGW,OSD
> also gets rebooted along with MDS)  we realizing around 40 seconds.
>
> Best Regards,
> Lokendra
>
>
> On Mon, May 3, 2021 at 10:30 PM Frank Schilder <frans@xxxxxx> wrote:
>
>> Following up on this and other comments, there are 2 different time
>> delays. One (1)  is the time it takes from killing an MDS until a stand-by
>> is made an active rank, and (2) the time it takes for the new active rank
>> to restore all client sessions. My experience is that (1) takes close to 0
>> seconds while (2) can take between 20-30 seconds depending on how busy the
>> clients are; the MDS will go through various states before reaching active.
>> We usually have ca. 1600 client connections to our FS. With fewer clients,
>> MDS fail-over is practically instantaneous. We are using latest mimic.
>>
>> From what you write, you seem to have a 40 seconds window for (1), which
>> points to a problem different to MON config values. This is supported by
>> your description including a MON election (??? this should never happen).
>> Do you have have services co-located? Which of the times (1) or (2) are you
>> referring to? How many FS clients do you have?
>>
>> Best regards,
>> =================
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> ________________________________________
>> From: Patrick Donnelly <pdonnell@xxxxxxxxxx>
>> Sent: 03 May 2021 17:19:37
>> To: Lokendra Rathour
>> Cc: Ceph Development; dev; ceph-users
>> Subject:  Re: [ Ceph MDS MON Config Variables ] Failover Delay
>> issue
>>
>> On Mon, May 3, 2021 at 6:36 AM Lokendra Rathour
>> <lokendrarathour@xxxxxxxxx> wrote:
>> >
>> > Hi Team,
>> > I was setting up the ceph cluster with
>> >
>> >    - Node Details:3 Mon,2 MDS, 2 Mgr, 2 RGW
>> >    - Deployment Type: Active Standby
>> >    - Testing Mode: Failover of MDS Node
>> >    - Setup : Octopus (15.2.7)
>> >    - OS: centos 8.3
>> >    - hardware: HP
>> >    - Ram:  128 GB on each Node
>> >    - OSD: 2 ( 1 tb each)
>> >    - Operation: Normal I/O with mkdir on every 1 second.
>> >
>> > T*est Case: Power-off any active MDS Node for failover to happen*
>> >
>> > *Observation:*
>> > We have observed that whenever an active MDS Node is down it takes
>> around*
>> > 40 seconds* to activate the standby MDS Node.
>> > on further checking the logs for the new-handover MDS Node we have seen
>> > delay on the basis of following inputs:
>> >
>> >    1. 10 second delay after which Mon calls for new Monitor election
>> >       1.  [log]  0 log_channel(cluster) log [INF] : mon.cephnode1 calling
>> >       monitor election
>>
>> In the process of killing the active MDS, are you also killing a monitor?
>>
>> --
>> Patrick Donnelly, Ph.D.
>> He / Him / His
>> Principal Software Engineer
>> Red Hat Sunnyvale, CA
>> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
>
> --
> ~ Lokendra
> www.inertiaspeaks.com
> www.inertiagroups.com
> skype: lokendrarathour
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx