On Wed, Aug 28, 2019 at 8:05 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Wed, 2019-08-28 at 17:48 +0800, chenerqi@xxxxxxxxx wrote: > > From: Erqi Chen <chenerqi@xxxxxxxxx> > > > > If client mds session is evicted in CEPH_MDS_SESSION_OPENING state, > > mds won't send session msg to client, and delayed_work skip > > CEPH_MDS_SESSION_OPENING state session, the session hang forever. > > ceph_con_keepalive reconnct connection for CEPH_MDS_SESSION_OPENING > > session to avoid session hang. > > > > Fixes: https://tracker.ceph.com/issues/41551 > > Signed-off-by: Erqi Chen chenerqi@xxxxxxxxx > > --- > > fs/ceph/mds_client.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > > index 920e9f0..eee4b63 100644 > > --- a/fs/ceph/mds_client.c > > +++ b/fs/ceph/mds_client.c > > @@ -4044,7 +4044,7 @@ static void delayed_work(struct work_struct *work) > > pr_info("mds%d hung\n", s->s_mds); > > } > > } > > - if (s->s_state < CEPH_MDS_SESSION_OPEN) { > > + if (s->s_state < CEPH_MDS_SESSION_OPENING) { > > /* this mds is failed or recovering, just wait */ > > ceph_put_mds_session(s); > > continue; > > Just for my own edification: > > OPENING == we've sent (or are sending) the session open request > OPEN == we've gotten the reply from the MDS and it was successful > > So in this case, the client got blacklisted after sending the request > but before the reply? Ok. > > So this should make it send a keepalive (or cap) message, at which point > the client discovers the connection is closed and then goes to reconnect > the session. This sounds sane to me, but I wonder if this would be > better expressed as: > > if (s->s_state == CEPH_MDS_SESSION_NEW) > should also avoid keepalive for CEPH_MDS_SESSION_RESTARTING and CEPH_MDS_SESSION_REJECTED > It always seems odd to me that we rely on the numerical values in this > enum. That said, we do that all over the code, so I'm inclined to just > merge this as-is (assuming Zheng concurs). > > -- > Jeff Layton <jlayton@xxxxxxxxxx> >