On Mon, 17 Jun 2013 11:13:49 -0400 Jeff Layton <jlayton@xxxxxxxxxx> wrote: > Having a global lock that protects all of this code is a clear > scalability problem. Instead of doing that, move most of the code to be > protected by the i_lock instead. The exceptions are the global lists > that the ->fl_link sits on, and the ->fl_block list. > > ->fl_link is what connects these structures to the > global lists, so we must ensure that we hold those locks when iterating > over or updating these lists. > > Furthermore, sound deadlock detection requires that we hold the > blocked_list state steady while checking for loops. We also must ensure > that the search and update to the list are atomic. > > For the checking and insertion side of the blocked_list, push the > acquisition of the global lock into __posix_lock_file and ensure that > checking and update of the blocked_list is done without dropping the > lock in between. > > On the removal side, when waking up blocked lock waiters, take the > global lock before walking the blocked list and dequeue the waiters from > the global list prior to removal from the fl_block list. > > With this, deadlock detection should be race free while we minimize > excessive file_lock_lock thrashing. > > Finally, in order to avoid a lock inversion problem when handling > /proc/locks output we must ensure that manipulations of the fl_block > list are also protected by the file_lock_lock. > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > --- > Documentation/filesystems/Locking | 21 ++++-- > fs/afs/flock.c | 5 +- > fs/ceph/locks.c | 2 +- > fs/ceph/mds_client.c | 8 +- > fs/cifs/cifsfs.c | 2 +- > fs/cifs/file.c | 13 ++-- > fs/gfs2/file.c | 2 +- > fs/lockd/svcsubs.c | 12 ++-- > fs/locks.c | 151 ++++++++++++++++++++++--------------- > fs/nfs/delegation.c | 10 +- > fs/nfs/nfs4state.c | 8 +- > fs/nfsd/nfs4state.c | 8 +- > include/linux/fs.h | 11 --- > 13 files changed, 140 insertions(+), 113 deletions(-) > [...] > @@ -1231,7 +1254,7 @@ int __break_lease(struct inode *inode, unsigned int mode) > if (IS_ERR(new_fl)) > return PTR_ERR(new_fl); > > - lock_flocks(); > + spin_lock(&inode->i_lock); > > time_out_leases(inode); > > @@ -1281,11 +1304,11 @@ restart: > break_time++; > } > locks_insert_block(flock, new_fl); > - unlock_flocks(); > + spin_unlock(&inode->i_lock); > error = wait_event_interruptible_timeout(new_fl->fl_wait, > !new_fl->fl_next, break_time); > - lock_flocks(); > - __locks_delete_block(new_fl); > + spin_lock(&inode->i_lock); > + locks_delete_block(new_fl); Doh -- bug here. This should not have been changed to locks_delete_block(). My apologies. > if (error >= 0) { > if (error == 0) > time_out_leases(inode); [...] > posix_unblock_lock(struct file *filp, struct file_lock *waiter) > { > + struct inode *inode = file_inode(filp); > int status = 0; > > - lock_flocks(); > + spin_lock(&inode->i_lock); > if (waiter->fl_next) > - __locks_delete_block(waiter); > + locks_delete_block(waiter); Ditto here... > else > status = -ENOENT; > - unlock_flocks(); > + spin_unlock(&inode->i_lock); > return status; > } > -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html