On Mon, 2019-02-04 at 19:05 +0000, Wan, Kaike wrote: > > > > and I see why you > > have to drop the lock before calling it as it will take the lock itself. But that > > raises the question of why you are taking the lock here? Are you trying to > > flush any concurrent accessors out before you test? If so, do you need the > > exp_lock too? And even if you are flushing out concurrent access, what's to > > prevent a new concurrent access as soon as you drop the lock? > qp->s_flags is modified all over the hfi1 driver by both the sending and receiving sides and we generally access it (read or modify it) only after acquiring the qp->s_lock to prevent obtaining any incorrect state information. That doesn't directly address my point. Any time you take a test for a condition that needs to be under a lock, and then put the resulting action you take as a result of that test outside of the lock, there is an inherent flaw in your locking strategy that must be overcome in some way. As you say "only after acquiring the qp->s_lock to prevent obtaining any incorrect state information", but the second you drop the lock, all guarantees about that state information being correct are lost. So what I'm looking for here is an explanation of how/why it is safe in this context for you to call hfi1_do_send() if, in fact, you get interrupted right after dropping the lock and something happens to clear out HFI1_S_WAIT_TID_SPACE (or, alternatively, why it's impossible for such a thing to happen). I don't want to have to take the time to completely reverse engineer how you guys do the locking in your driver, but when you do something that is a known wrong/dangerous thing unless there are non-obvious guarantees that make it safe, then I'd like an explanation from you as to why/how it's safe. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Attachment:
signature.asc
Description: This is a digitally signed message part