On 24/05/2023 21.15, Emmanuel Jaep wrote: > Hi, > > we are currently running a ceph fs cluster at the following version: > MDS version: ceph version 16.2.10 > (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable) > > The cluster is composed of 7 active MDSs and 1 standby MDS: > RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS > 0 active icadmin012 Reqs: 73 /s 1938k 1880k 85.3k 92.8k > 1 active icadmin008 Reqs: 206 /s 2375k 2375k 7081 171k > 2 active icadmin007 Reqs: 91 /s 5709k 5256k 149k 299k > 3 active icadmin014 Reqs: 93 /s 679k 664k 40.1k 216k > 4 active icadmin013 Reqs: 86 /s 3585k 3569k 12.7k 197k > 5 active icadmin011 Reqs: 72 /s 225k 221k 8611 164k > 6 active icadmin015 Reqs: 87 /s 1682k 1610k 27.9k 274k > POOL TYPE USED AVAIL > cephfs_metadata metadata 8552G 22.3T > cephfs_data data 226T 22.3T > STANDBY MDS > icadmin006 > > When I restart one of the active MDSs, the standby MDS becomes active and > its state becomes "replay". So far, so good! > > However, only one of the other "active" MDSs seems to remain active. All > activities drop from the other ones: > RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS > 0 active icadmin012 Reqs: 0 /s 1938k 1881k 85.3k 9720 > 1 active icadmin008 Reqs: 0 /s 2375k 2375k 7080 2505 > 2 active icadmin007 Reqs: 2 /s 5709k 5256k 149k 26.5k > 3 active icadmin014 Reqs: 0 /s 679k 664k 40.1k 3259 > 4 replay icadmin006 801k 801k 1279 0 > 5 active icadmin011 Reqs: 0 /s 225k 221k 8611 9241 > 6 active icadmin015 Reqs: 0 /s 1682k 1610k 27.9k 34.8k > POOL TYPE USED AVAIL > cephfs_metadata metadata 8539G 22.8T > cephfs_data data 225T 22.8T > STANDBY MDS > icadmin013 > > In effect, the cluster becomes almost unavailable until the newly promoted > MDS finishes rejoining the cluster. > > Obviously, this defeats the purpose of having 7MDSs. > Is this behavior? > If not, what configuration items should I check to go back to "normal" > operations? > Please ignore my previous email, I read too quickly. I see you do have a standby. However, that does not allow fast failover with multiple MDSes. For fast failover of any active MDS, you need one standby-replay daemon for *each* active MDS. Each standby-replay MDS follows one active MDS's rank only, you can't have one standby-replay daemon following all ranks. What you have right now is probably a regular standby daemon, which can take over any failed MDS, but requires waiting for the replay time. See: https://docs.ceph.com/en/latest/cephfs/standby/#configuring-standby-replay My explanation for the zero ops from the previous email still holds: it's likely that most clients will hang if any MDS rank is down/unavailable. - Hector _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx