On Wed, 2019-11-20 at 03:29 -0500, xiubli@xxxxxxxxxx wrote: > From: Xiubo Li <xiubli@xxxxxxxxxx> > > In case the max_mds > 1 in MDS cluster and there is no any standby > MDS and all the max_mds MDSs are in up:active state, if one of the > up:active MDSs is dead, the m->m_num_laggy in kclient will be 1. > Then the mount will fail without considering other healthy MDSs. > > Only when all the MDSs in the cluster are laggy will treat the > cluster as not be available. > > Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > --- > fs/ceph/mdsmap.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c > index 471bac335fae..8b4f93e5b468 100644 > --- a/fs/ceph/mdsmap.c > +++ b/fs/ceph/mdsmap.c > @@ -396,7 +396,7 @@ bool ceph_mdsmap_is_cluster_available(struct ceph_mdsmap *m) > return false; > if (m->m_damaged) > return false; > - if (m->m_num_laggy > 0) > + if (m->m_num_laggy == m->m_num_mds) > return false; > for (i = 0; i < m->m_num_mds; i++) { > if (m->m_info[i].state == CEPH_MDS_STATE_ACTIVE) Given that laggy servers are still expected to be "in" the cluster, should we just eliminate this check altogether? It seems like we'd still want to allow a mount to occur even if the cluster is lagging. -- Jeff Layton <jlayton@xxxxxxxxxx>