On Tue, 2019-12-03 at 09:29 -0500, xiubli@xxxxxxxxxx wrote: > From: Xiubo Li <xiubli@xxxxxxxxxx> > > The possible max rank, it maybe larger than the m->m_num_mds, > for example if the mds_max == 2 in the cluster, when the MDS(0) > was laggy and being replaced by a new MDS, we will temporarily > receive a new mds map with n_num_mds == 1 and the active MDS(1), > and the mds rank >= m->m_num_mds. > > Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > --- > fs/ceph/mdsmap.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c > index 284d68646c40..a77e0ecb9a6b 100644 > --- a/fs/ceph/mdsmap.c > +++ b/fs/ceph/mdsmap.c > @@ -129,6 +129,7 @@ struct ceph_mdsmap *ceph_mdsmap_decode(void **p, void *end) > int err; > u8 mdsmap_v, mdsmap_cv; > u16 mdsmap_ev; > + u32 possible_max_rank; > > m = kzalloc(sizeof(*m), GFP_NOFS); > if (!m) > @@ -164,6 +165,15 @@ struct ceph_mdsmap *ceph_mdsmap_decode(void **p, void *end) > m->m_num_mds = n = ceph_decode_32(p); > m->m_num_active_mds = m->m_num_mds; > > + /* > + * the possible max rank, it maybe larger than the m->m_num_mds, > + * for example if the mds_max == 2 in the cluster, when the MDS(0) > + * was laggy and being replaced by a new MDS, we will temporarily > + * receive a new mds map with n_num_mds == 1 and the active MDS(1), > + * and the mds rank >= m->m_num_mds. > + */ > + possible_max_rank = max((u32)m->m_num_mds, m->m_max_mds); > + > m->m_info = kcalloc(m->m_num_mds, sizeof(*m->m_info), GFP_NOFS); > if (!m->m_info) > goto nomem; > @@ -238,7 +248,7 @@ struct ceph_mdsmap *ceph_mdsmap_decode(void **p, void *end) > ceph_mds_state_name(state), > laggy ? "(laggy)" : ""); > > - if (mds < 0 || mds >= m->m_num_mds) { > + if (mds < 0 || mds >= possible_max_rank) { > pr_warn("mdsmap_decode got incorrect mds(%d)\n", mds); > continue; > } Thanks, Xiubo. I'll squash this one into your earlier ceph_mdsmap_decode patch, since it's fixing that logic up. -- Jeff Layton <jlayton@xxxxxxxxxx>