On 10/1/19 8:07 AM, Jason Gunthorpe wrote:
On Mon, Sep 30, 2019 at 04:16:53PM -0700, Bart Van Assche wrote:
Instead of calling rdma_destroy_id() after waiting for the context completion
finished, call rdma_destroy_id() from inside the ucma_put_ctx() function.
This patch reduces the number of rdma_destroy_id() calls but does not change
the behavior of this code.
Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
drivers/infiniband/core/ucma.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 0274e9b704be..30c09864fd9e 100644
+++ b/drivers/infiniband/core/ucma.c
@@ -160,8 +160,14 @@ static struct ucma_context *ucma_get_ctx(struct ucma_file *file, int id)
static void ucma_put_ctx(struct ucma_context *ctx)
{
- if (atomic_dec_and_test(&ctx->ref))
- complete(&ctx->comp);
+ if (!atomic_dec_and_test(&ctx->ref))
+ return;
+ /*
+ * rdma_destroy_id() ensures that no event handlers are inflight
+ * for that id before releasing it.
+ */
+ rdma_destroy_id(ctx->cm_id);
+ complete(&ctx->comp);
}
Since all the refcounting here is basically insane, you can't do this
without creating new kinds of bugs related to lifetime of ctx->cm_id
The call to rdma_destroy_id must be after the xa_erase as other
threads can continue to access the context despite its zero ref via
ucma_get_ctx(() as ucma_get_ctx() is not using refcounts properly.
The xa_erase provides the needed barrier.
Maybe this patch could be fixed if the ucma_get_ctx used an
atomic_inc_not_zero ?
Hi Jason,
Since I'm not an ucma expert, maybe it's better that I drop this patch.
Thanks,
Bart.