On Tue, 2022-03-08 at 21:23 +0800, xiubli@xxxxxxxxxx wrote: > From: Xiubo Li <xiubli@xxxxxxxxxx> > > When reconnecting MDS it will reopen the con with new ip address, > but the when opening the con with new address it couldn't be sure > that the stale work has finished. So it's possible that the stale > work queued will use the new data. > > This will use cancel_delayed_work_sync() instead. > > Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > --- > > V2: > - Call cancel_con() after dropping the mutex > > > net/ceph/messenger.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c > index d3bb656308b4..62e39f63f94c 100644 > --- a/net/ceph/messenger.c > +++ b/net/ceph/messenger.c > @@ -581,8 +581,8 @@ void ceph_con_close(struct ceph_connection *con) > > ceph_con_reset_protocol(con); > ceph_con_reset_session(con); > - cancel_con(con); > mutex_unlock(&con->mutex); > + cancel_con(con); Now the question is: Is it safe to cancel this work outside the mutex or will this open up any races. Unfortunately with coarse-grained locks like this, it's hard to tell what the lock actually protects. If we need to keep the cancel inside the lock for some reason, you could instead just add a "flush_workqueue()" after dropping the mutex in the above function. So, this looks reasonable to me at first glance, but I'd like Ilya to ack this before we merge it. > } > EXPORT_SYMBOL(ceph_con_close); > > @@ -1416,7 +1416,7 @@ static void queue_con(struct ceph_connection *con) > > static void cancel_con(struct ceph_connection *con) > { > - if (cancel_delayed_work(&con->work)) { > + if (cancel_delayed_work_sync(&con->work)) { > dout("%s %p\n", __func__, con); > con->ops->put(con); > } -- Jeff Layton <jlayton@xxxxxxxxxx>