Re: [PATCH] locks: ignore same lock in blocked_lock_hash

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Fri, 22 Mar 2019 16:27:51 -0400

On Fri, Mar 22, 2019 at 12:58:47PM +1100, NeilBrown wrote:
> On Thu, Mar 21 2019, J. Bruce Fields wrote:
> 
> > On Thu, Mar 21, 2019 at 07:27:44AM -0400, Jeff Layton wrote:
> >> Andreas reported that he was seeing the tdbtorture test fail in
> >> some cases with -EDEADLCK when it wasn't before. Some debugging
> >> showed that deadlock detection was sometimes discovering the
> >> caller's lock request itself in a dependency chain.
> >> 
> >> If posix_locks_deadlock() fails to find a deadlock, the caller_fl
> >> will be passed to __locks_insert_block(), and this wakes up all
> >> locks that are blocked on caller_fl, clearing the fl_blocker link.
> >> 
> >> So if posix_locks_deadlock() finds caller_fl while searching for
> >> a deadlock,
> >
> > I'm feeling dense.  Could you step me through the scenario in a little
> > more detail?
> 
> Three processes: A, B, C.  One byte in one file: X
> 
> A gets a read lock on X.  Doesn't block, so nothing goes in the hash
>   table.
> 
> B gets a read lock on X.  Doesn't block.
> 
> C tries to get a write lock on X.  It cannot so it can block on either A
>   or B.  It chooses B.  It check if this might cause a deadlock, but the
>   hash is empty so it doesn't.  It gets added to the hash. "C -> B"
> 
> A tries to get a write lock on X. It cannot because B has a read lock,
>   so it needs to block on B, or some child.  C is a child and A's
>   request conflicts with C's request, so A blocks on C.
>   It first checks: Does A->C cause a loop in the hash-table?  No. So
>   it proceeds to block and adds  "A->C" to the hash table.
> 
> B drops its read lock. This wakes up the first child: C.
> 
> C tries (again) to get a write lock on X.  It cannot because A still
>   holds a read lock, so it will have to block on A.  First it must
>   check if "C->A" will cause a loop in the hash-table ... Yes it will
>   because "A->C" is already there - error!
> 
> This is a false-positive because the very next thing to be done after
> posix_locks_deadlock() (while still holding all spinlocks), is
> __locks_insert_block(C->A) which calls __locks_wake_up_blocks(C) which
> calls __locks_delete_block() for any pending block ??->C and so will
> remove A->C from the hash.
> 
> Maybe we could call __locks_wake_up_block() *before* calling
> posix_locks_deadlock().  That *might* make more sense. 
>   
> 
> >
> > Also, how do we know this catches every such case?
> 
> The only links in the hash table that are about to be removed are those
> that record that something is blocked by C (the lock we are trying to
> get).  So these are the only ones that it is reasonable to special-case
> in posix_locks_deadlock().
> 
> >
> > And why aren't we unhashing blocks when we wake them up?
> 
> We do - that is exactly when we unhash them.  The problem is they we
> wake them up *after* we check for errors, rather than before.

Thanks!

OK, somehow I overlooked
__locks_wake_up_blocks->__locks_delete_block->locks_delete_global_block.

And I missed that all this was happening while still under the
blocked_lock_lock, so we don't have to worry about other tasks leaving
locks in this state.

It's possible I'm still being a little dense, but the new check could
really use a careful comment (maybe even as detailed as the above).

--b.