On Mon, Sep 30, 2019 at 04:16:53PM -0700, Bart Van Assche wrote: > Instead of calling rdma_destroy_id() after waiting for the context completion > finished, call rdma_destroy_id() from inside the ucma_put_ctx() function. > This patch reduces the number of rdma_destroy_id() calls but does not change > the behavior of this code. > > Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx> > drivers/infiniband/core/ucma.c | 17 ++++++++--------- > 1 file changed, 8 insertions(+), 9 deletions(-) > > diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c > index 0274e9b704be..30c09864fd9e 100644 > +++ b/drivers/infiniband/core/ucma.c > @@ -160,8 +160,14 @@ static struct ucma_context *ucma_get_ctx(struct ucma_file *file, int id) > > static void ucma_put_ctx(struct ucma_context *ctx) > { > - if (atomic_dec_and_test(&ctx->ref)) > - complete(&ctx->comp); > + if (!atomic_dec_and_test(&ctx->ref)) > + return; > + /* > + * rdma_destroy_id() ensures that no event handlers are inflight > + * for that id before releasing it. > + */ > + rdma_destroy_id(ctx->cm_id); > + complete(&ctx->comp); > } Since all the refcounting here is basically insane, you can't do this without creating new kinds of bugs related to lifetime of ctx->cm_id The call to rdma_destroy_id must be after the xa_erase as other threads can continue to access the context despite its zero ref via ucma_get_ctx(() as ucma_get_ctx() is not using refcounts properly. The xa_erase provides the needed barrier. Maybe this patch could be fixed if the ucma_get_ctx used an atomic_inc_not_zero ? Jason