RE: [PATCH RESEND for-next 5/9] IB/{rdmavt, qib, hfi1}: Use new routine to release reference counts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> 
> On Wed, Apr 17, 2019 at 08:42:52PM -0700, Ira Weiny wrote:
> > On Wed, Apr 17, 2019 at 03:20:05AM -0300, Jason Gunthorpe wrote:
> > > On Mon, Apr 15, 2019 at 08:52:52PM +0000, Ruhl, Michael J wrote:
> > >
> > > > We do need the reference count because the AH is used by the
> > > > asynchronous send engine.
> > >
> > > Drivers cannot use refcounts to manage their object lifetimes.
> > > Destroy must be synchronous.
> >
> > I admit that I'm not really following along but it seems wrong to tell
> > drivers that they can't refcount their objects...  One would think
> > that a refcount of their objects is safer than some of the locking hoops we
> have seen...
> 
> It is not safer, it just pushed bugs into the ULPs.

Ok I see what you are saying and I agree...  For objects which are controlled by the core.  See below...

> 
> Drivers cannot fail destroy - and they cannot continue to hold onto related
> references after destroy returns (ie a QP cannot continue to block destruction
> of a PD). In almost all cases this means the object must actually be destroyed
> during destroy.
> 
> The most we could offer a driver is to hold the backing object memory with a
> kref, so the driver could allow other threads to still see the dead object after
> destroy succeeds. This might be what hfi needs.

Maybe.  I think this is part of the problem with having objects which the core controls but which the drivers have critical data in.  I'm not sure about performance but if the driver needs something to hold onto for internal use then that object should be separate from the core object.  Or as you say destroy must block.

The way things are structured now I agree with you.

> 
> Basically once destroyed an object can have no futher impact on any other
> object.

I agree.

> 
> > Do you mean they can't (don't need to) refcount objects which the core
> > is responsible for like QP, AH, etc or all objects?
> 
> You can refcount them, but then you have to wait for the refcount to go to
> zero during destroy, and odds are good that will cause the driver to deadlock
> itself.

Ok I think the answer to my question is "yes we are speaking of objects the core is tracking for the driver on behalf of the user".

> 
> Strategies for destroy that gauarentee forward progress are better, ie
> serialize with parallel thread, use locking, etc.

Agreed

> 
> This is why I also don't like the idea of destroy being non-sleepable...

Agreed...  It does depend on the situation.  I just wanted to make it clear that this is not some hard and fast rule that "RDMA drivers shall never do reference counting."  I think in some situations for __internal__ resources it is appropriate.

Make sense?
Ira

> 
> Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux