> Subject: Re: [PATCH rdma-next 01/10] RDMA: Restore ability to fail on PD > deallocate > > On Wed, Aug 26, 2020 at 12:49:03AM +0000, Saleem, Shiraz wrote: > > > The API is quite confusing now. If drivers are not expected to fail > > the destroy and there is no way to propagate the device failures, then > > the return type should be a void. > > More or less, drivers can only return -EAGAIN with the idea that a future call during > the close process will eventually succeed. > > Any permanent failure will trigger WARN_ON and a memory leak > > Maybe we should switch the return code to bool or something to be a little clearer > that it is request to retry, not a failure? > There is no retry for kernel object destroy right? So not sure bool is making It clearer. I am not very familiar with devx flows but doesn’t it bypass the ib verbs layer altogether? i.e. mlx5_ib_dealloc_pd isn’t directly called in the devx flows no? So why changes its return type and other provider destroy callbacks? But lets go down the path that we really need a return code in the destroy APIs to solve this problem. For one I don’t see how we can say its meant exclusively for devx drivers to use for a fail. Also can we really claim the API contract is that driver can fail a destroy given a future destroy will succeed? Since the kernel destroy has no retry. Which then boils down do we just keep a simpler definition of the API contract -- driver can just return whatever the true error code is? i.e. if it wants a retry, use EAGAIN. If it has a non recoverable device error, then reset the device, clean up the resources but return ENOTRECOVERABLE. ib_core can enable the retry logic for EAGAIN _only_. For other error codes, Ib_core can trigger a warn_on or something to indicate permanent failure. It can also pass on ret_code to user-space as its doing today? Shiraz