Re: [PATCH v2] libceph: wait for con->work to finish when cancelling con

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 08 Mar 2022 08:37:02 -0500

On Tue, 2022-03-08 at 21:23 +0800, xiubli@xxxxxxxxxx wrote:
> From: Xiubo Li <xiubli@xxxxxxxxxx>
> 
> When reconnecting MDS it will reopen the con with new ip address,
> but the when opening the con with new address it couldn't be sure
> that the stale work has finished. So it's possible that the stale
> work queued will use the new data.
> 
> This will use cancel_delayed_work_sync() instead.
> 
> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
> ---
> 
> V2:
> - Call cancel_con() after dropping the mutex
> 
> 
>  net/ceph/messenger.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
> index d3bb656308b4..62e39f63f94c 100644
> --- a/net/ceph/messenger.c
> +++ b/net/ceph/messenger.c
> @@ -581,8 +581,8 @@ void ceph_con_close(struct ceph_connection *con)
>  
>  	ceph_con_reset_protocol(con);
>  	ceph_con_reset_session(con);
> -	cancel_con(con);
>  	mutex_unlock(&con->mutex);
> +	cancel_con(con);

Now the question is: Is it safe to cancel this work outside the mutex or
will this open up any races. Unfortunately with coarse-grained locks
like this, it's hard to tell what the lock actually protects.

If we need to keep the cancel inside the lock for some reason, you could
instead just add a "flush_workqueue()" after dropping the mutex in the
above function.

So, this looks reasonable to me at first glance, but I'd like Ilya to
ack this before we merge it.

>  }
>  EXPORT_SYMBOL(ceph_con_close);
>  
> @@ -1416,7 +1416,7 @@ static void queue_con(struct ceph_connection *con)
>  
>  static void cancel_con(struct ceph_connection *con)
>  {
> -	if (cancel_delayed_work(&con->work)) {
> +	if (cancel_delayed_work_sync(&con->work)) {
>  		dout("%s %p\n", __func__, con);
>  		con->ops->put(con);
>  	}

-- 
Jeff Layton <jlayton@xxxxxxxxxx>