On Mon, Oct 12, 2020 at 7:21 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > Some messages sent by the MDS entail a session sequence number > increment, and the MDS will drop certain types of requests on the floor > when the sequence numbers don't match. > > In particular, a REQUEST_CLOSE message can cross with one of the > sequence morphing messages from the MDS which can cause the client to > stall, waiting for a response that will never come. > > Originally, this meant an up to 5s delay before the recurring workqueue > job kicked in and resent the request, but a recent change made it so > that the client would never resend, causing a 60s stall unmounting and > sometimes a blockisting event. > > Add a new helper for incrementing the session sequence and then testing > to see whether a REQUEST_CLOSE needs to be resent. Change all of the > bare sequence counter increments to use the new helper. > > URL: https://tracker.ceph.com/issues/47563 > Fixes: fa9967734227 ("ceph: fix potential mdsc use-after-free crash") > Reported-by: Patrick Donnelly <pdonnell@xxxxxxxxxx> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > --- > fs/ceph/caps.c | 2 +- > fs/ceph/mds_client.c | 35 +++++++++++++++++++++++++++++------ > fs/ceph/mds_client.h | 1 + > fs/ceph/quota.c | 2 +- > fs/ceph/snap.c | 2 +- > 5 files changed, 33 insertions(+), 9 deletions(-) > > v2: move seq increment and check for closed session into new helper > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > index c00abd7eefc1..ba0e4f44862c 100644 > --- a/fs/ceph/caps.c > +++ b/fs/ceph/caps.c > @@ -4071,7 +4071,7 @@ void ceph_handle_caps(struct ceph_mds_session *session, > vino.snap, inode); > > mutex_lock(&session->s_mutex); > - session->s_seq++; > + inc_session_sequence(session); > dout(" mds%d seq %lld cap seq %u\n", session->s_mds, session->s_seq, > (unsigned)seq); > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > index 0190555b1f9e..17b94f06826a 100644 > --- a/fs/ceph/mds_client.c > +++ b/fs/ceph/mds_client.c > @@ -4237,7 +4237,7 @@ static void handle_lease(struct ceph_mds_client *mdsc, > dname.len, dname.name); > > mutex_lock(&session->s_mutex); > - session->s_seq++; > + inc_session_sequence(session); > > if (!inode) { > dout("handle_lease no inode %llx\n", vino.ino); > @@ -4384,14 +4384,25 @@ static void maybe_recover_session(struct ceph_mds_client *mdsc) > ceph_force_reconnect(fsc->sb); > } > > +static bool check_session_closing(struct ceph_mds_session *s) > +{ > + int ret; > + > + if (s->s_state != CEPH_MDS_SESSION_CLOSING) > + return true; > + > + dout("resending session close request for mds%d\n", s->s_mds); > + ret = request_close_session(s); > + if (ret < 0) > + pr_err("ceph: Unable to close session to mds %d: %d\n", s->s_mds, ret); > + return false; > +} > + > bool check_session_state(struct ceph_mds_session *s) > { > - if (s->s_state == CEPH_MDS_SESSION_CLOSING) { > - dout("resending session close request for mds%d\n", > - s->s_mds); > - request_close_session(s); > + if (!check_session_closing(s)) Do we actually need it in check_session_state() now? Given that it is conditioned on CEPH_MDS_SESSION_CLOSING and we know check_session_state() doesn't get called in that case, it seems like dead code to me. Thanks, Ilya