On Mon, Oct 12, 2020 at 5:13 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > Some messages sent by the MDS entail a session sequence number > increment, and the MDS will drop certain types of requests on the floor > when the sequence numbers don't match. > > In particular, a REQUEST_CLOSE message can cross with one of sequence > morphing messages from the MDS, which can cause the client to stall, > waiting for a response that will never come. > > Originally, this meant an up to 5s delay before the recurring workqueue > job kicked in and resent the request, but a recent change made it so > that the client would never resend, causing a 60s stall unmounting and > sometimes a blockisting event. > > Fix this by checking the connection state after bumping the session > sequence, which should cause a retransmit of the REQUEST_CLOSE, when > this occurs. > > URL: https://tracker.ceph.com/issues/47563 > Fixes: fa9967734227 ("ceph: fix potential mdsc use-after-free crash") > Reported-by: Patrick Donnelly <pdonnell@xxxxxxxxxx> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > --- > fs/ceph/caps.c | 1 + > fs/ceph/mds_client.c | 1 + > fs/ceph/quota.c | 1 + > fs/ceph/snap.c | 1 + > 4 files changed, 4 insertions(+) > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > index c00abd7eefc1..ac822c74baea 100644 > --- a/fs/ceph/caps.c > +++ b/fs/ceph/caps.c > @@ -4072,6 +4072,7 @@ void ceph_handle_caps(struct ceph_mds_session *session, > > mutex_lock(&session->s_mutex); > session->s_seq++; > + check_session_state(session); > dout(" mds%d seq %lld cap seq %u\n", session->s_mds, session->s_seq, > (unsigned)seq); > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > index 0190555b1f9e..69f529d894e6 100644 > --- a/fs/ceph/mds_client.c > +++ b/fs/ceph/mds_client.c > @@ -4238,6 +4238,7 @@ static void handle_lease(struct ceph_mds_client *mdsc, > > mutex_lock(&session->s_mutex); > session->s_seq++; > + check_session_state(session); > > if (!inode) { > dout("handle_lease no inode %llx\n", vino.ino); > diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c > index 83cb4f26b689..a09667ee83c1 100644 > --- a/fs/ceph/quota.c > +++ b/fs/ceph/quota.c > @@ -54,6 +54,7 @@ void ceph_handle_quota(struct ceph_mds_client *mdsc, > /* increment msg sequence number */ > mutex_lock(&session->s_mutex); > session->s_seq++; > + check_session_state(session); > mutex_unlock(&session->s_mutex); > > /* lookup inode */ > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c > index 0da39c16dab4..f1e73a65f4a5 100644 > --- a/fs/ceph/snap.c > +++ b/fs/ceph/snap.c > @@ -874,6 +874,7 @@ void ceph_handle_snap(struct ceph_mds_client *mdsc, > > mutex_lock(&session->s_mutex); > session->s_seq++; > + check_session_state(session); > mutex_unlock(&session->s_mutex); > > down_write(&mdsc->snap_rwsem); > -- > 2.26.2 > A new helper just for if (s->s_state == CEPH_MDS_SESSION_CLOSING) { dout("resending session close request for mds%d\n", s->s_mds); request_close_session(s); } would be more precise IMO. It could check request_close_session() return value and log the error, too. Thanks, Ilya