On Wed, Apr 17, 2019 at 08:42:52PM -0700, Ira Weiny wrote: > On Wed, Apr 17, 2019 at 03:20:05AM -0300, Jason Gunthorpe wrote: > > On Mon, Apr 15, 2019 at 08:52:52PM +0000, Ruhl, Michael J wrote: > > > > > We do need the reference count because the AH is used by the asynchronous > > > send engine. > > > > Drivers cannot use refcounts to manage their object lifetimes. Destroy > > must be synchronous. > > I admit that I'm not really following along but it seems wrong to tell drivers > that they can't refcount their objects... One would think that a refcount of > their objects is safer than some of the locking hoops we have seen... It is not safer, it just pushed bugs into the ULPs. Drivers cannot fail destroy - and they cannot continue to hold onto related references after destroy returns (ie a QP cannot continue to block destruction of a PD). In almost all cases this means the object must actually be destroyed during destroy. The most we could offer a driver is to hold the backing object memory with a kref, so the driver could allow other threads to still see the dead object after destroy succeeds. This might be what hfi needs. Basically once destroyed an object can have no futher impact on any other object. > Do you mean they can't (don't need to) refcount objects which the core is > responsible for like QP, AH, etc or all objects? You can refcount them, but then you have to wait for the refcount to go to zero during destroy, and odds are good that will cause the driver to deadlock itself. Strategies for destroy that gauarentee forward progress are better, ie serialize with parallel thread, use locking, etc. This is why I also don't like the idea of destroy being non-sleepable... Jason