Re: [PATCH rc] RDMA/cma: Ensure rdma_addr_cancel() happens before issuing more requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 16, 2021 at 03:34:46PM -0300, Jason Gunthorpe wrote:
> The FSM can run in a circle allowing rdma_resolve_ip() to be called twice
> on the same id_priv. While this cannot happen without going through the
> work, it violates the invariant that the same address resolution
> background request cannot be active twice.
> 
>        CPU 1                                  CPU 2
> 
> rdma_resolve_addr():
>   RDMA_CM_IDLE -> RDMA_CM_ADDR_QUERY
>   rdma_resolve_ip(addr_handler)  #1
> 
> 			 process_one_req(): for #1
>                           addr_handler():
>                             RDMA_CM_ADDR_QUERY -> RDMA_CM_ADDR_BOUND
>                             mutex_unlock(&id_priv->handler_mutex);
>                             [.. handler still running ..]
> 
> rdma_resolve_addr():
>   RDMA_CM_ADDR_BOUND -> RDMA_CM_ADDR_QUERY
>   rdma_resolve_ip(addr_handler)
>     !! two requests are now on the req_list
> 
> rdma_destroy_id():
>  destroy_id_handler_unlock():
>   _destroy_id():
>    cma_cancel_operation():
>     rdma_addr_cancel()
> 
>                           // process_one_req() self removes it
> 		          spin_lock_bh(&lock);
>                            cancel_delayed_work(&req->work);
> 	                   if (!list_empty(&req->list)) == true
> 
>       ! rdma_addr_cancel() returns after process_on_req #1 is done
> 
>    kfree(id_priv)
> 
> 			 process_one_req(): for #2
>                           addr_handler():
> 	                    mutex_lock(&id_priv->handler_mutex);
>                             !! Use after free on id_priv
> 
> rdma_addr_cancel() expects there to be one req on the list and only
> cancels the first one. The self-removal behavior of the work only happens
> after the handler has returned. This yields a situations where the
> req_list can have two reqs for the same "handle" but rdma_addr_cancel()
> only cancels the first one.
> 
> The second req remains active beyond rdma_destroy_id() and will
> use-after-free id_priv once it inevitably triggers.
> 
> Fix this by remembering if the id_priv has called rdma_resolve_ip() and
> always cancel before calling it again. This ensures the req_list never
> gets more than one item in it and doesn't cost anything in the normal flow
> that never uses this strange error path.
> 
> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: e51060f08a61 ("IB: IP address based RDMA connection manager")
> Reported-by: syzbot+dc3dfba010d7671e05f5@xxxxxxxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
> ---
>  drivers/infiniband/core/cma.c      | 17 +++++++++++++++++
>  drivers/infiniband/core/cma_priv.h |  1 +
>  2 files changed, 18 insertions(+)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index c40791baced588..751cf5ea25f296 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -1776,6 +1776,14 @@ static void cma_cancel_operation(struct rdma_id_private *id_priv,
>  {
>  	switch (state) {
>  	case RDMA_CM_ADDR_QUERY:
> +		/*
> +		 * We can avoid doing the rdma_addr_cancel() based on state,
> +		 * only RDMA_CM_ADDR_QUERY has a work that could still execute.
> +		 * Notice that the addr_handler work could still be exiting
> +		 * outside this state, however due to the interaction with the
> +		 * handler_mutex the work is guaranteed not to touch id_priv
> +		 * during exit.
> +		 */
>  		rdma_addr_cancel(&id_priv->id.route.addr.dev_addr);
>  		break;
>  	case RDMA_CM_ROUTE_QUERY:
> @@ -3413,6 +3421,15 @@ int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
>  		if (dst_addr->sa_family == AF_IB) {
>  			ret = cma_resolve_ib_addr(id_priv);
>  		} else {
> +			/* The FSM can return back to RDMA_CM_ADDR_BOUND after
> +			 * rdma_resolve_ip() is called, eg through the error
> +			 * path in addr_handler. If this happens the existing
> +			 * request must be canceled before issuing a new one.
> +			 */
> +			if (id_priv->used_resolve_ip)
> +				rdma_addr_cancel(&id->route.addr.dev_addr);
> +			else
> +				id_priv->used_resolve_ip = 1;

Why don't you never clear this field? If you assume that this is one lifetime
event, can you please add a comment with an explanation "why"?

Thanks

>  			ret = rdma_resolve_ip(cma_src_addr(id_priv), dst_addr,
>  					      &id->route.addr.dev_addr,
>  					      timeout_ms, addr_handler,
> diff --git a/drivers/infiniband/core/cma_priv.h b/drivers/infiniband/core/cma_priv.h
> index 5c463da9984536..f92f101ea9818f 100644
> --- a/drivers/infiniband/core/cma_priv.h
> +++ b/drivers/infiniband/core/cma_priv.h
> @@ -91,6 +91,7 @@ struct rdma_id_private {
>  	u8			afonly;
>  	u8			timeout;
>  	u8			min_rnr_timer;
> +	u8 used_resolve_ip;
>  	enum ib_gid_type	gid_type;
>  
>  	/*
> 
> base-commit: ad17bbef3dd573da937816edc0ab84fed6a17fa6
> -- 
> 2.33.0
> 



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux