On Wed, 2019-08-28 at 17:48 +0800, chenerqi@xxxxxxxxx wrote: > From: Erqi Chen <chenerqi@xxxxxxxxx> > > If client mds session is evicted in CEPH_MDS_SESSION_OPENING state, > mds won't send session msg to client, and delayed_work skip > CEPH_MDS_SESSION_OPENING state session, the session hang forever. > ceph_con_keepalive reconnct connection for CEPH_MDS_SESSION_OPENING > session to avoid session hang. > > Fixes: https://tracker.ceph.com/issues/41551 > Signed-off-by: Erqi Chen chenerqi@xxxxxxxxx > --- > fs/ceph/mds_client.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > index 920e9f0..eee4b63 100644 > --- a/fs/ceph/mds_client.c > +++ b/fs/ceph/mds_client.c > @@ -4044,7 +4044,7 @@ static void delayed_work(struct work_struct *work) > pr_info("mds%d hung\n", s->s_mds); > } > } > - if (s->s_state < CEPH_MDS_SESSION_OPEN) { > + if (s->s_state < CEPH_MDS_SESSION_OPENING) { > /* this mds is failed or recovering, just wait */ > ceph_put_mds_session(s); > continue; Just for my own edification: OPENING == we've sent (or are sending) the session open request OPEN == we've gotten the reply from the MDS and it was successful So in this case, the client got blacklisted after sending the request but before the reply? Ok. So this should make it send a keepalive (or cap) message, at which point the client discovers the connection is closed and then goes to reconnect the session. This sounds sane to me, but I wonder if this would be better expressed as: if (s->s_state == CEPH_MDS_SESSION_NEW) It always seems odd to me that we rely on the numerical values in this enum. That said, we do that all over the code, so I'm inclined to just merge this as-is (assuming Zheng concurs). -- Jeff Layton <jlayton@xxxxxxxxxx>