Re: 【A question about kernel verbs API】

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2019/6/15 15:44, Leon Romanovsky 写道:
> On Fri, Jun 14, 2019 at 09:16:15AM +0800, oulijun wrote:
>> Hi, Jason Gunthorpe & Leon Romanovsky& Parav Pandit
>>    Recently when I was learning kernel ofed code, I found an interesting thing about verbs, the implementation rely on
>> roce driver,  taking ib_dereg_mr for example.
>>
>> When the driver returns error, the reference count of rdma resource(pd, mr, etc.) won't be decreased. I worried that it
>> will cause a memory leak.
> I assume that we are talking about lack of rdma_restrack_put(pd) inside
> of ib_dereg_mr(). It is done on purpose because purpose of restrack is
> to protect memory from being released async, for example QP during netlink
> queries. In sync flows, it is response of uverbs and ULPs to release
> resources in proper order, which they do.
>
>> int ib_dereg_mr(struct ib_mr *mr)
>> {
>> 	struct ib_pd *pd = mr->pd;
>> 	struct ib_dm *dm = mr->dm;
>> 	int ret;
>>
>> 	rdma_restrack_del(&mr->res);
>> 	ret = mr->device->ops.dereg_mr(mr);
>> 	if (!ret) {
>> 		atomic_dec(&pd->usecnt);
>> 		if (dm)
>> 			atomic_dec(&dm->usecnt);
>> 	}
>>
>> 	return ret;
>> }
>>
>> and it is even worse in ib_destroy_qp, rdma_put_gid_attr is not called, so that the net_device will be hold all the time, and caused the
>> pcie remove scenario hanging.
>>
>> The following code is:
>> int ib_destroy_qp(struct ib_qp *qp)
>> {
>> 	const struct ib_gid_attr *alt_path_sgid_attr = qp->alt_path_sgid_attr;
>> 	const struct ib_gid_attr *av_sgid_attr = qp->av_sgid_attr;
>> 	struct ib_pd *pd;
>> 	struct ib_cq *scq, *rcq;
>> 	struct ib_srq *srq;
>> 	struct ib_rwq_ind_table *ind_tbl;
>> 	struct ib_qp_security *sec;
>> 	int ret;
>>
>> 	WARN_ON_ONCE(qp->mrs_used > 0);
>>
>> 	if (atomic_read(&qp->usecnt))
>> 		return -EBUSY;
>>
>> 	if (qp->real_qp != qp)
>> 		return __ib_destroy_shared_qp(qp);
>>
>> 	pd   = qp->pd;
>> 	scq  = qp->send_cq;
>> 	rcq  = qp->recv_cq;
>> 	srq  = qp->srq;
>> 	ind_tbl = qp->rwq_ind_tbl;
>> 	sec  = qp->qp_sec;
>> 	if (sec)
>> 		ib_destroy_qp_security_begin(sec);
>>
>> 	if (!qp->uobject)
>> 		rdma_rw_cleanup_mrs(qp);
>>
>> 	rdma_restrack_del(&qp->res);
>> 	ret = qp->device->ops.destroy_qp(qp);
>> 	if (!ret) {
>> 		if (alt_path_sgid_attr)
>> 			rdma_put_gid_attr(alt_path_sgid_attr);
>> 		if (av_sgid_attr)
>> 			rdma_put_gid_attr(av_sgid_attr);
>> 		if (pd)
>> 			atomic_dec(&pd->usecnt);
>> 		if (scq)
>> 			atomic_dec(&scq->usecnt);
>> 		if (rcq)
>> 			atomic_dec(&rcq->usecnt);
>> 		if (srq)
>> 			atomic_dec(&srq->usecnt);
>> 		if (ind_tbl)
>> 			atomic_dec(&ind_tbl->usecnt);
>> 		if (sec)
>> 			ib_destroy_qp_security_end(sec);
>> 	} else {
>> 		if (sec)
>> 			ib_destroy_qp_security_abort(sec);
>> 	}
>>
>> 	return ret;
>> }
>>
>>
>> My question is what the condieration about the  resource destroy mechanism when roce driver returns error?
> Like all other destroy calls, the QP destroy should be converted do not
> fail. QPs and MRs are hardest part of my conversion series and amount
> of code needed to be fixed in drivers is astonishing.
Thanks.
>> Or I understand it wrong?
>>
>> Thanks
>> Lijun Ou
>>
> .
>





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux