RE: [PATCH v5] IB: Improve uverbs_cleanup_ucontext algorithm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Jason Gunthorpe
> Sent: Thursday, June 28, 2018 5:59 PM
> To: linux-rdma@xxxxxxxxxxxxxxx
> Cc: Yishai Hadas <yishaih@xxxxxxxxxxxx>; Leon Romanovsky
> <leonro@xxxxxxxxxxxx>
> Subject: [PATCH v5] IB: Improve uverbs_cleanup_ucontext algorithm
> 
> Improve uverbs_cleanup_ucontext algorithm to work properly when the
> topology graph of the objects cannot be determined at compile time.  This is the
> case with objects created via the devx interface in mlx5.
> 
> Typically uverbs objects must be created in a strict topologically sorted order, so
> that LIFO ordering will generally cause them to be freed properly. There are only
> a few cases (eg memory windows) where objects can point to things out of the
> strict LIFO order.
> 
> Instead of using an explicit ordering scheme where the HW destroy is not
> allowed to fail, go over the list multiple times and allow the destroy function to
> fail. If progress halts then a final, desperate, cleanup is done before leaking the
> memory. This indicates a driver bug.
> 
> Signed-off-by: Yishai Hadas <yishaih@xxxxxxxxxxxx>
> Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx>
> ---
>  drivers/infiniband/core/rdma_core.c           | 113 ++++++++++--------
>  drivers/infiniband/core/uverbs.h              |   2 +-
>  drivers/infiniband/core/uverbs_cmd.c          |  11 +-
>  drivers/infiniband/core/uverbs_std_types.c    |  62 ++++++----
>  .../core/uverbs_std_types_counters.c          |   9 +-
>  drivers/infiniband/core/uverbs_std_types_cq.c |  18 +--
>  drivers/infiniband/core/uverbs_std_types_dm.c |   9 +-
>  .../core/uverbs_std_types_flow_action.c       |   9 +-
>  drivers/infiniband/core/uverbs_std_types_mr.c |   3 +-
>  drivers/infiniband/hw/mlx5/devx.c             |   8 +-
>  include/rdma/ib_verbs.h                       |  46 ++++++-
>  include/rdma/uverbs_types.h                   |  11 +-
>  12 files changed, 190 insertions(+), 111 deletions(-)
> 
> 
> Just posting the complete patch after the edits I made so everyone can check it
> fully.
> 
> v5:
>  - Added a comment in uverbs_free_qp [Yishai]
>  - Correcting grammar in ib_destroy_usecnt comment [Yishai]
>  - Added a note to ib_is_destroy_retryable about
>    object locking (explains why the odd looking atomic_read is safe)
>    [Leon]
>  - Commit message reworded
> 
> diff --git a/drivers/infiniband/core/rdma_core.c
> b/drivers/infiniband/core/rdma_core.c
> index df3c405332525a..2ddf1c716ba801 100644
> --- a/drivers/infiniband/core/rdma_core.c
> +++ b/drivers/infiniband/core/rdma_core.c
> @@ -360,9 +360,10 @@ static int __must_check
> remove_commit_idr_uobject(struct ib_uobject *uobj,
> 
>  	/*
>  	 * We can only fail gracefully if the user requested to destroy the
> -	 * object. In the rest of the cases, just remove whatever you can.
> +	 * object or when a retry may be called upon an error.
> +	 * In the rest of the cases, just remove whatever you can.
>  	 */
> -	if (why == RDMA_REMOVE_DESTROY && ret)
> +	if (ib_is_destroy_retryable(ret, why, uobj))
>  		return ret;
> 
>  	ib_rdmacg_uncharge(&uobj->cg_obj, uobj->context->device, @@ -
> 393,7 +394,7 @@ static int __must_check remove_commit_fd_uobject(struct
> ib_uobject *uobj,
>  		container_of(uobj, struct ib_uobject_file, uobj);
>  	int ret = fd_type->context_closed(uobj_file, why);
> 
> -	if (why == RDMA_REMOVE_DESTROY && ret)
> +	if (ib_is_destroy_retryable(ret, why, uobj))
>  		return ret;
> 
>  	if (why == RDMA_REMOVE_DURING_CLEANUP) { @@ -422,7 +423,7
> @@ static int __must_check _rdma_remove_commit_uobject(struct ib_uobject
> *uobj,
>  	struct ib_ucontext *ucontext = uobj->context;
> 
>  	ret = uobj->type->type_class->remove_commit(uobj, why);
> -	if (ret && why == RDMA_REMOVE_DESTROY) {
> +	if (ib_is_destroy_retryable(ret, why, uobj)) {
>  		/* We couldn't remove the object, so just unlock the uobject */
>  		atomic_set(&uobj->usecnt, 0);
>  		uobj->type->type_class->lookup_put(uobj, true); @@ -645,61
> +646,77 @@ void uverbs_close_fd(struct file *f)
>  	kref_put(uverbs_file_ref, ib_uverbs_release_file);  }
> 
> -void uverbs_cleanup_ucontext(struct ib_ucontext *ucontext, bool
> device_removed)
> +static int __uverbs_cleanup_ucontext(struct ib_ucontext *ucontext,
> +				    enum rdma_remove_reason reason)
>  {
> -	enum rdma_remove_reason reason = device_removed ?
> -		RDMA_REMOVE_DRIVER_REMOVE : RDMA_REMOVE_CLOSE;
> -	unsigned int cur_order = 0;
> +	struct ib_uobject *obj, *next_obj;
> +	int ret = -EINVAL;
> +	int err = 0;
> 
> +	/*
> +	 * This shouldn't run while executing other commands on this
> +	 * context. Thus, the only thing we should take care of is
> +	 * releasing a FD while traversing this list. The FD could be
> +	 * closed and released from the _release fop of this FD.
> +	 * In order to mitigate this, we add a lock.
> +	 * We take and release the lock per traversal in order to let
> +	 * other threads (which might still use the FDs) chance to run.
> +	 */
> +	mutex_lock(&ucontext->uobjects_lock);
>  	ucontext->cleanup_reason = reason;
Any reason to protect writing cleanup_reason?
Couldn't find a reader that also holds uobjects_lock while reading.

> +	list_for_each_entry_safe(obj, next_obj, &ucontext->uobjects, list) {
> +		/*
> +		 * if we hit this WARN_ON, that means we are
> +		 * racing with a lookup_get.
> +		 */
> +		WARN_ON(uverbs_try_lock_object(obj, true));
> +		err = obj->type->type_class->remove_commit(obj, reason);
> +
> +		if (ib_is_destroy_retryable(err, reason, obj)) {
> +			pr_debug("ib_uverbs: failed to remove uobject id %d err
> %d\n",
> +				 obj->id, err);
> +			atomic_set(&obj->usecnt, 0);
> +			continue;
> +		}
> +
> +		if (err)
> +			pr_err("ib_uverbs: unable to remove uobject id %d err
> %d\n",
> +				obj->id, err);
> +
> +		list_del(&obj->list);
> +		/* put the ref we took when we created the object */
> +		uverbs_uobject_put(obj);
> +		ret = 0;
> +	}
> +	mutex_unlock(&ucontext->uobjects_lock);
> +	return ret;

If there are no objects present, won't this function return error because at beginning ret = -EINVAL?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux