On 3/8/22 7:45 PM, Jeff Layton wrote:
On Tue, 2022-03-08 at 17:59 +0800, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>
When reconnecting MDS it will reopen the con with new ip address,
but the when opening the con with new address it couldn't be sure
that the stale work has finished. So it's possible that the stale
work queued will use the new data.
This will use cancel_delayed_work_sync() instead.
URL: https://tracker.ceph.com/issues/54461
Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
net/ceph/messenger.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index d3bb656308b4..32eb5dc00583 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -1416,7 +1416,7 @@ static void queue_con(struct ceph_connection *con)
static void cancel_con(struct ceph_connection *con)
{
- if (cancel_delayed_work(&con->work)) {
+ if (cancel_delayed_work_sync(&con->work)) {
dout("%s %p\n", __func__, con);
con->ops->put(con);
}
Won't this deadlock?
This function is called from ceph_con_close with the con->mutex held.
The work will try to take the same mutex and will get stuck. If you want
to do this, then you may also need to change it to call cancel_con after
dropping the mutex.
Yeah, correct :-)
- Xiubo