Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

Hector Martin <marcan@xxxxxxxxx> · Wed, 24 May 2023 21:30:21 +0900

On 24/05/2023 21.15, Emmanuel Jaep wrote:
> Hi,
> 
> we are currently running a ceph fs cluster at the following version:
> MDS version: ceph version 16.2.10
> (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
> 
> The cluster is composed of 7 active MDSs and 1 standby MDS:
> RANK  STATE      MDS         ACTIVITY     DNS    INOS   DIRS   CAPS
>  0    active  icadmin012  Reqs:   73 /s  1938k  1880k  85.3k  92.8k
>  1    active  icadmin008  Reqs:  206 /s  2375k  2375k  7081    171k
>  2    active  icadmin007  Reqs:   91 /s  5709k  5256k   149k   299k
>  3    active  icadmin014  Reqs:   93 /s   679k   664k  40.1k   216k
>  4    active  icadmin013  Reqs:   86 /s  3585k  3569k  12.7k   197k
>  5    active  icadmin011  Reqs:   72 /s   225k   221k  8611    164k
>  6    active  icadmin015  Reqs:   87 /s  1682k  1610k  27.9k   274k
>       POOL         TYPE     USED  AVAIL
> cephfs_metadata  metadata  8552G  22.3T
>   cephfs_data      data     226T  22.3T
> STANDBY MDS
>  icadmin006
> 
> When I restart one of the active MDSs, the standby MDS becomes active and
> its state becomes "replay". So far, so good!
> 
> However, only one of the other "active" MDSs seems to remain active. All
> activities drop from the other ones:
> RANK  STATE      MDS         ACTIVITY     DNS    INOS   DIRS   CAPS
>  0    active  icadmin012  Reqs:    0 /s  1938k  1881k  85.3k  9720
>  1    active  icadmin008  Reqs:    0 /s  2375k  2375k  7080   2505
>  2    active  icadmin007  Reqs:    2 /s  5709k  5256k   149k  26.5k
>  3    active  icadmin014  Reqs:    0 /s   679k   664k  40.1k  3259
>  4    replay  icadmin006                  801k   801k  1279      0
>  5    active  icadmin011  Reqs:    0 /s   225k   221k  8611   9241
>  6    active  icadmin015  Reqs:    0 /s  1682k  1610k  27.9k  34.8k
>       POOL         TYPE     USED  AVAIL
> cephfs_metadata  metadata  8539G  22.8T
>   cephfs_data      data     225T  22.8T
> STANDBY MDS
>  icadmin013
> 
> In effect, the cluster becomes almost unavailable until the newly promoted
> MDS finishes rejoining the cluster.
> 
> Obviously, this defeats the purpose of having 7MDSs.
> Is this behavior?
> If not, what configuration items should I check to go back to "normal"
> operations?
> 

Please ignore my previous email, I read too quickly. I see you do have a
standby. However, that does not allow fast failover with multiple MDSes.

For fast failover of any active MDS, you need one standby-replay daemon
for *each* active MDS. Each standby-replay MDS follows one active MDS's
rank only, you can't have one standby-replay daemon following all ranks.
What you have right now is probably a regular standby daemon, which can
take over any failed MDS, but requires waiting for the replay time.

See:

https://docs.ceph.com/en/latest/cephfs/standby/#configuring-standby-replay

My explanation for the zero ops from the previous email still holds:
it's likely that most clients will hang if any MDS rank is down/unavailable.

- Hector
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx