Re: [PATCH rdma-next 01/10] RDMA: Restore ability to fail on PD deallocate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25/08/2020 16:07, Jason Gunthorpe wrote:
> On Tue, Aug 25, 2020 at 03:12:07PM +0300, Gal Pressman wrote:
>> On 25/08/2020 14:52, Jason Gunthorpe wrote:
>>> On Tue, Aug 25, 2020 at 11:13:25AM +0300, Gal Pressman wrote:
>>>> On 24/08/2020 13:32, Leon Romanovsky wrote:
>>>>> diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h
>>>>> index 1889dd172a25..8547f9d543df 100644
>>>>> +++ b/drivers/infiniband/hw/efa/efa.h
>>>>> @@ -134,7 +134,7 @@ int efa_query_gid(struct ib_device *ibdev, u8 port, int index,
>>>>>  int efa_query_pkey(struct ib_device *ibdev, u8 port, u16 index,
>>>>>  		   u16 *pkey);
>>>>>  int efa_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata);
>>>>> -void efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata);
>>>>> +int efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata);
>>>>>  int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata);
>>>>>  struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
>>>>>  			    struct ib_qp_init_attr *init_attr,
>>>>> diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
>>>>> index 3f7f19b9f463..660a69943e02 100644
>>>>> +++ b/drivers/infiniband/hw/efa/efa_verbs.c
>>>>> @@ -383,13 +383,14 @@ int efa_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
>>>>>  	return err;
>>>>>  }
>>>>>
>>>>> -void efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
>>>>> +int efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
>>>>>  {
>>>>>  	struct efa_dev *dev = to_edev(ibpd->device);
>>>>>  	struct efa_pd *pd = to_epd(ibpd);
>>>>>
>>>>>  	ibdev_dbg(&dev->ibdev, "Dealloc pd[%d]\n", pd->pdn);
>>>>>  	efa_pd_dealloc(dev, pd->pdn);
>>>>> +	return 0;
>>>>>  }
>>>>
>>>> Nice change, thanks Leon.
>>>> At least for EFA, I prefer to return the return value of the destroy command
>>>> instead of silently ignoring it (same for the other patches).
>>>
>>> Drivers can't fail the destroy unless a future destroy will succeed.
>>> it breaks everything to do that.
>>
>> What does it break?
> 
> For uverbs it will go into an infinite loop in
> uverbs_destroy_ufile_hw() if destroy doesn't eventually succeed.

The code breaks the loop in such cases, why infinite loop?

> For kernel it will trigger WARN_ON's and then a permanent memory leak.
> 
>> I agree that drivers shouldn't fail destroy commands, but you know.. bugs/errors
>> happen (especially when dealing with hardware), and we have a way to propagate
>> them, why do it for only some of the drivers?
> 
> There is no way to propogate them.
> 
> All destroy must eventually succeed.

There is no way to propagate them on process cleanup, but the destroy verbs have
a return code all the way back to libibverbs, which we can use for error
propagation. The cleanup flow can either ignore the return value, or we can add
another parameter that explicitly means the call shouldn't fail and all
allocated memory/state should be freed.

>>> If the chip fails a destroy when it should not then it has failed and
>>> should be disabled at PCI and reset, continuing to free anyhow.
>>
>> How do we reset the device when there are active apps using it?
> 
> The zap stuff revokes the BAR mmaping, it triggerst device fatal to
> userspace and that is mostly it for userspace..

Interesting, is there a reference driver that does that today?



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux