Re: [PATCH for-next 3/9] RDMA/hns: Completely release qp resources when hw err

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2019-08-09 at 17:41 +0800, Lijun Ou wrote:
> From: Yangyang Li <liyangyang20@xxxxxxxxxx>
> 
> Even if no response from hardware, make sure that qp related
> resources are completely released.
> 
> Signed-off-by: Yangyang Li <liyangyang20@xxxxxxxxxx>
> ---
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> index 7a14f0b..0409851 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> @@ -4562,16 +4562,14 @@ static int
> hns_roce_v2_destroy_qp_common(struct hns_roce_dev *hr_dev,
>  {
>  	struct hns_roce_cq *send_cq, *recv_cq;
>  	struct ib_device *ibdev = &hr_dev->ib_dev;
> -	int ret;
> +	int ret = 0;
>  
>  	if (hr_qp->ibqp.qp_type == IB_QPT_RC && hr_qp->state !=
> IB_QPS_RESET) {
>  		/* Modify qp to reset before destroying qp */
>  		ret = hns_roce_v2_modify_qp(&hr_qp->ibqp, NULL, 0,
>  					    hr_qp->state, IB_QPS_RESET);
> -		if (ret) {
> +		if (ret)
>  			ibdev_err(ibdev, "modify QP to Reset
> failed.\n");
> -			return ret;
> -		}
>  	}
>  
>  	send_cq = to_hr_cq(hr_qp->ibqp.send_cq);
> @@ -4627,7 +4625,7 @@ static int hns_roce_v2_destroy_qp_common(struct
> hns_roce_dev *hr_dev,
>  		kfree(hr_qp->rq_inl_buf.wqe_list);
>  	}
>  
> -	return 0;
> +	return ret;
>  }
>  
>  static int hns_roce_v2_destroy_qp(struct ib_qp *ibqp, struct ib_udata
> *udata)
> @@ -4637,11 +4635,9 @@ static int hns_roce_v2_destroy_qp(struct ib_qp
> *ibqp, struct ib_udata *udata)
>  	int ret;
>  
>  	ret = hns_roce_v2_destroy_qp_common(hr_dev, hr_qp, udata);
> -	if (ret) {
> +	if (ret)
>  		ibdev_err(&hr_dev->ib_dev, "Destroy qp 0x%06lx
> failed(%d)\n",
>  			  hr_qp->qpn, ret);
> -		return ret;
> -	}
>  
>  	if (hr_qp->ibqp.qp_type == IB_QPT_GSI)
>  		kfree(hr_to_hr_sqp(hr_qp));

I don't know your hardware, but this patch sounds wrong/dangerous to me.
As long as the resources this card might access are allocated by the
kernel, you can't get random data corruption by the card writing to
memory used elsewhere in the kernel.  So if your card is not responding
to your requests to free the resources, it would seem safer to leak
those resources permanently than to free them and risk the card coming
back to life long enough to corrupt memory reallocated to some other
task.

Only if you can guarantee me that there is no way your commands to the
card will fail and then the card start working again later would I
consider this patch safe.  And if it's possible for the card to hang
like this, should that be triggering a reset of the device?

-- 
Doug Ledford <dledford@xxxxxxxxxx>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux