On Thu, Apr 18, 2019 at 04:11:56PM +0000, Weiny, Ira wrote: > > > > On Wed, Apr 17, 2019 at 08:42:52PM -0700, Ira Weiny wrote: > > > On Wed, Apr 17, 2019 at 03:20:05AM -0300, Jason Gunthorpe wrote: > > > > On Mon, Apr 15, 2019 at 08:52:52PM +0000, Ruhl, Michael J wrote: > > > > > > > > > We do need the reference count because the AH is used by the > > > > > asynchronous send engine. > > > > > > > > Drivers cannot use refcounts to manage their object lifetimes. > > > > Destroy must be synchronous. > > > > > > I admit that I'm not really following along but it seems wrong to tell > > > drivers that they can't refcount their objects... One would think > > > that a refcount of their objects is safer than some of the locking hoops we > > have seen... > > > > It is not safer, it just pushed bugs into the ULPs. > > Ok I see what you are saying and I agree... For objects which are > controlled by the core. See below... No, for all objects that interact with the defined uverbs API. Ie verbs objects. It doesn't matter if the core or driver allocates the memory, the rules are the same. > > Drivers cannot fail destroy - and they cannot continue to hold onto related > > references after destroy returns (ie a QP cannot continue to block destruction > > of a PD). In almost all cases this means the object must actually be destroyed > > during destroy. > > > > The most we could offer a driver is to hold the backing object memory with a > > kref, so the driver could allow other threads to still see the dead object after > > destroy succeeds. This might be what hfi needs. > > Maybe. I think this is part of the problem with having objects > which the core controls but which the drivers have critical data in. > I'm not sure about performance but if the driver needs something to > hold onto for internal use then that object should be separate from > the core object. Or as you say destroy must block. Or the driver works with the core to extend the life of the backing memory (kref, rcu, etc) even though the object still must be destroyed. > > This is why I also don't like the idea of destroy being non-sleepable... > > Agreed... It does depend on the situation. I just wanted to make > it clear that this is not some hard and fast rule that "RDMA drivers > shall never do reference counting." I think in some situations for > __internal__ resources it is appropriate. Maybe, but most often cases I see trying to do 'refcounts' without sleeping are just some kind of broken kref. Jason