On Sun, Apr 25, 2021 at 10:08:57AM -0300, Jason Gunthorpe wrote: > On Sun, Apr 25, 2021 at 04:03:47PM +0300, Leon Romanovsky wrote: > > On Thu, Apr 22, 2021 at 11:29:39AM -0300, Jason Gunthorpe wrote: > > > On Wed, Apr 21, 2021 at 08:03:22AM +0300, Leon Romanovsky wrote: > > > > > > > I didn't understand when reviewed either, but decided to post it anyway > > > > to get possible explanation for this RDMA_RESTRACK_MR || RDMA_RESTRACK_QP > > > > check. > > > > > > I think the whole thing should look more like this and we delete the > > > if entirely. > > > > I have mixed feelings about this approach. Before "destroy can fail disaster", > > the restrack goal was to provide the following flow: > > 1. create new memory object - rdma_restrack_new() > > 2. create new HW object - .create_XXX() callback in the driver > > 3. add HW object to the DB - rdma_restrack_del() > > .... > > 4. wait for any work on this HW object to complete - rdma_restrack_del() > > 5. safely destroy HW object - .destroy_XXX() > > > > I really would like to stay with this flow and block any access to the > > object that failed to destruct - maybe add to some zombie list. > > That isn't the semantic we now have for destroy. I would say that it is my mistake introduced when changed destroy to return an error. > > > The proposed prepare/abort/finish flow is much harder to implement correctly. > > Let's take as an example ib_destroy_qp_user(), we called to rdma_rw_cleanup_mrs(), > > but didn't restore them after .destroy_qp() failure. > > I think it is a bug we call rdma_rw code in a a user path. It was an example of a flow that wasn't restored properly. The same goes for ib_dealloc_pd_user(), release of __internal_mr. Of course, these flows shouldn't fail because of being kernel flows, but it is not clear from the code. Thanks > > Jason