On Sat, Mar 23, 2019 at 8:51 PM NeilBrown <neilb@xxxxxxxx> wrote: > > > The patch title > locks: ignore same lock in blocked_lock_hash > > is a bit misleading the lock isn't in the hash, but it is linked from > something that is. Maybe > > locks: ignore same lock in posix_locks_deadlock() > > ?? > > On Sat, Mar 23 2019, Jeff Layton wrote: > > > Andreas reported that he was seeing the tdbtorture test fail in > > some cases with -EDEADLCK when it wasn't before. Some debugging > > showed that deadlock detection was sometimes discovering the > > caller's lock request itself in a dependency chain. > > > > If posix_locks_deadlock() fails to find a deadlock, the caller_fl > > will be passed to __locks_insert_block(), and this wakes up all > > locks that are blocked on caller_fl, clearing the fl_blocker link. > > > > So if posix_locks_deadlock() finds caller_fl while searching for > > a deadlock, it can be sure that link in the cycle is about to be > > broken and it need not treat it as the cause of a deadlock. > > > > URL: https://bugzilla.kernel.org/show_bug.cgi?id=202975 > > Fixes: 5946c4319ebb ("fs/locks: allow a lock request to block other requests.") > > Cc: stable@xxxxxxxxxxxxxxx > > Reported-by: Andreas Schneider <asn@xxxxxxxxxx> > > Signed-off-by: Neil Brown <neilb@xxxxxxxx> > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > --- > > fs/locks.c | 13 +++++++++++++ > > 1 file changed, 13 insertions(+) > > > > diff --git a/fs/locks.c b/fs/locks.c > > index eaa1cfaf73b0..a939a274dc71 100644 > > --- a/fs/locks.c > > +++ b/fs/locks.c > > @@ -1023,6 +1023,19 @@ static int posix_locks_deadlock(struct file_lock *caller_fl, > > while ((block_fl = what_owner_is_waiting_for(block_fl))) { > > if (i++ > MAX_DEADLK_ITERATIONS) > > return 0; > > + > > + /* > > + * It's possible that we're retrying this lock request after > > + * another task is has blocked on it. A lock request can't > > + * block itself, and any locks that are blocked on it will > > + * also be awoken soon (and have their fl_blocker pointer > > + * cleared). Any dependency chain that contains the request > > + * itself is therefore about to be broken, so we can safely > > + * ignore it. > > That first sentence isn't working for me .... maybe remove the "is" ?? > > Thanks, > NeilBrown > > > > + */ > > + if (block_fl == caller_fl) > > + return 0; > > + > > if (posix_same_owner(caller_fl, block_fl)) > > return 1; > > } > > -- > > 2.20.1 Makes sense. That said, I think I'm starting to like your earlier idea of just waking up any locks blocked on this request prior to the deadlock check. I've tested that and it also fixes this, and may be a bit more efficient. I'll follow up with a v3 patch that does that. Thanks, -- Jeff Layton <jlayton@xxxxxxxxxxxxxxx>