Re: [PATCH v2] libceph: wait for con->work to finish when cancelling con

Xiubo Li <xiubli@xxxxxxxxxx> · Wed, 9 Mar 2022 09:46:54 +0800

On 3/8/22 11:36 PM, Ilya Dryomov wrote:
On Tue, Mar 8, 2022 at 3:24 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote:

On 3/8/22 9:37 PM, Jeff Layton wrote:
On Tue, 2022-03-08 at 21:23 +0800, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>

When reconnecting MDS it will reopen the con with new ip address,
but the when opening the con with new address it couldn't be sure
that the stale work has finished. So it's possible that the stale
work queued will use the new data.

This will use cancel_delayed_work_sync() instead.

Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---

V2:
- Call cancel_con() after dropping the mutex


   net/ceph/messenger.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index d3bb656308b4..62e39f63f94c 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -581,8 +581,8 @@ void ceph_con_close(struct ceph_connection *con)

      ceph_con_reset_protocol(con);
      ceph_con_reset_session(con);
-    cancel_con(con);
      mutex_unlock(&con->mutex);
+    cancel_con(con);
Now the question is: Is it safe to cancel this work outside the mutex or
will this open up any races. Unfortunately with coarse-grained locks
like this, it's hard to tell what the lock actually protects.

If we need to keep the cancel inside the lock for some reason, you could
instead just add a "flush_workqueue()" after dropping the mutex in the
above function.

So, this looks reasonable to me at first glance, but I'd like Ilya to
ack this before we merge it.
IMO it should be okay, since the 'queue_con(con)', which doing the
similar things, also outside the mutex.
Hi Xiubo,

I read the patch description and skimmed through the linked trackers
but I don't understand the issue.  ceph_con_workfn() holds con->mutex
for most of the time it's running and cancel_delayed_work() is called
under the same con->mutex.  It's true that ceph_con_workfn() work may
not be finished by the time ceph_con_close() returns but I don't see
how that can result in anything bad happening.

Can you explain the issue in more detail, with pointers to specific
code snippets in the MDS client and the messenger?  Where exactly is
the "new data" (what data, please be specific) gets misused by the
"stale work"?

The tracker I attached in V1 is not exact, please ignore that.

From the current code, there has one case that for ceph fs in 
send_mds_reconnect():

4256         ceph_con_close(&session->s_con);
4257         ceph_con_open(&session->s_con,
4258                       CEPH_ENTITY_TYPE_MDS, mds,
4259                       ceph_mdsmap_get_addr(mdsc->mdsmap, mds));

If in ceph_con_close() just before cancelling the con->work, it was 
already fired but then the queue thread was just scheduled out when the 
con->work was trying to take the con->mutex.

And then in ceph_con_open() it will update the con->state to 
'CEPH_CON_S_PREOPEN' and other members then queue the con->work again.

That means the con->work will be run twice with the state 
'CEPH_CON_S_PREOPEN'.

I am not sure whether will this cause strange issues like the URL I 
attached in V1.

- Xiubo

Thanks,

                 Ilya