Re: [PATCH for-next 02/17] IB/hfi1: TID RDMA flow allocation

Doug Ledford <dledford@xxxxxxxxxx> · Mon, 04 Feb 2019 14:20:02 -0500

On Mon, 2019-02-04 at 19:05 +0000, Wan, Kaike wrote:
> > 
> > and I see why you
> > have to drop the lock before calling it as it will take the lock itself.  But that
> > raises the question of why you are taking the lock here?  Are you trying to
> > flush any concurrent accessors out before you test?  If so, do you need the
> > exp_lock too?  And even if you are flushing out concurrent access, what's to
> > prevent a new concurrent access as soon as you drop the lock?
> qp->s_flags is modified all over the hfi1 driver by both the sending and receiving sides and we generally access it (read or modify it) only after acquiring the qp->s_lock to prevent obtaining any incorrect state information.

That doesn't directly address my point.  Any time you take a test for a
condition that needs to be under a lock, and then put the resulting
action you take as a result of that test outside of the lock, there is
an inherent flaw in your locking strategy that must be overcome in some
way.  As you say "only after acquiring the qp->s_lock to prevent
obtaining any incorrect state information", but the second you drop the
lock, all guarantees about that state information being correct are
lost.  So what I'm looking for here is an explanation of how/why it is
safe in this context for you to call hfi1_do_send() if, in fact, you get
interrupted right after dropping the lock and something happens to clear
out HFI1_S_WAIT_TID_SPACE (or, alternatively, why it's impossible for
such a thing to happen).  I don't want to have to take the time to
completely reverse engineer how you guys do the locking in your driver,
but when you do something that is a known wrong/dangerous thing unless
there are non-obvious guarantees that make it safe, then I'd like an
explanation from you as to why/how it's safe.

-- 
Doug Ledford <dledford@xxxxxxxxxx>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
Attachment:
signature.asc

Description: This is a digitally signed message part