On Thu, Aug 27, 2020 at 11:29:44PM +0000, Saleem, Shiraz wrote: > > Subject: Re: [PATCH rdma-next 01/10] RDMA: Restore ability to fail on PD > > deallocate > > > > On Thu, Aug 27, 2020 at 02:06:03AM +0000, Saleem, Shiraz wrote: > > > > > Which then boils down do we just keep a simpler definition of the API > > > contract -- driver can just return whatever the true error code is? > > > > No, that was always wrong. In almost every case returning codes from destroy is a > > driver bug, flat out. It causes kernel leaking memory/worse and unrecoverable > > userspace failures. > > > seems like we are opening a can then. It is not something new, it has always been like this, with these rules. The effort to remove the return codes simply failed :( > I can see a new provider seeing the int return type and returning error codes. > And maybe being stumped by seeing some providers ignoring device errors and faking a success. > And one provider returning error codes. No, things can't ignore device failures. If the provider can't shutdown a rouge device then it must return error, leak memory and accept the WARN_ON. Otherwise the device will cause memory corruption by DMA'ing to memory that has been freed. Having a RDMA driver that can do recovery from HW errors via device reset is really required to close these edge cases. I suspect no RDMA driver gets this all right today. Jason