Re: [PATCH 2/3] mdsmap: fix mdsmap cluster available check based on laggy number

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019/11/22 1:30, Jeff Layton wrote:
On Wed, 2019-11-20 at 03:29 -0500, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>

In case the max_mds > 1 in MDS cluster and there is no any standby
MDS and all the max_mds MDSs are in up:active state, if one of the
up:active MDSs is dead, the m->m_num_laggy in kclient will be 1.
Then the mount will fail without considering other healthy MDSs.

Only when all the MDSs in the cluster are laggy will treat the
cluster as not be available.

Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
  fs/ceph/mdsmap.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c
index 471bac335fae..8b4f93e5b468 100644
--- a/fs/ceph/mdsmap.c
+++ b/fs/ceph/mdsmap.c
@@ -396,7 +396,7 @@ bool ceph_mdsmap_is_cluster_available(struct ceph_mdsmap *m)
  		return false;
  	if (m->m_damaged)
  		return false;
-	if (m->m_num_laggy > 0)
+	if (m->m_num_laggy == m->m_num_mds)
  		return false;
  	for (i = 0; i < m->m_num_mds; i++) {
  		if (m->m_info[i].state == CEPH_MDS_STATE_ACTIVE)
Given that laggy servers are still expected to be "in" the cluster,
should we just eliminate this check altogether? It seems like we'd still
want to allow a mount to occur even if the cluster is lagging.

For this we need one way to distinguish between mds crash and transient mds laggy, for now in both cases the mds will keep staying "in" the cluster and be in "up:active & laggy" state.







[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux