On Wed, Jul 10, 2013 at 08:04:11AM +1000, Dave Chinner wrote: > On Wed, Jul 03, 2013 at 04:12:25PM -0400, J. Bruce Fields wrote: > > From: "J. Bruce Fields" <bfields@xxxxxxxxxx> > > > > We want to do this elsewhere as well. > > > > Cc: "Theodore Ts'o" <tytso@xxxxxxx> > > Cc: Andreas Dilger <adilger.kernel@xxxxxxxxx> > > Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx> > > --- > > fs/ext4/ext4.h | 2 -- > > fs/ext4/ioctl.c | 4 ++-- > > fs/ext4/move_extent.c | 40 ++-------------------------------------- > > fs/inode.c | 29 +++++++++++++++++++++++++++++ > > include/linux/fs.h | 3 +++ > > 5 files changed, 36 insertions(+), 42 deletions(-) > > > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > > index 5aae3d1..3590abe 100644 Thanks for the comment: > Just to throw a spanner in the works - have you considered that > other filesystems might have different inode lock ordering rules? > > For example, XFS locks multiple inodes in ascending inode number > order, not ordered by pointer address. Hence we end up different > inode lock ordering at different layers of the stack and I can't see > that ending well.... What lock(s) is it taking exactly, where? If there's a possible deadlock, can we come up with a compatible ordering? > > diff --git a/fs/inode.c b/fs/inode.c > > index 00d5fc3..b8afbc7 100644 > > --- a/fs/inode.c > > +++ b/fs/inode.c > > @@ -980,6 +980,35 @@ void unlock_new_inode(struct inode *inode) > > EXPORT_SYMBOL(unlock_new_inode); > > > > /** > > + * lock_two_nondirectories - take two i_mutexes on non-directory objects > > + * @inode1: first inode to lock > > + * @inode2: second inode to lock > > + */ > > +void lock_two_nondirectories(struct inode *inode1, struct inode *inode2) > > +{ > > + if (inode1 < inode2) { > > + mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT); > > + mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD); > > + } else { > > + mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT); > > + mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD); > > + } > > +} > > +EXPORT_SYMBOL(lock_two_nondirectories); > > What makes this specific to non-directories? See http://mid.gmane.org/<1372882356-14168-5-git-send-email-bfields@xxxxxxxxxx> The only caller outside ext4 is vfs_rename_other. I think we could make it work for directories two if necessary though the ordering would be more complicated. Currently there's no reason. > If it's not to be used for directory inodes, then there should be > WARN_ON_ONCE() guards in the code... Sure. So something like the following. Hm. I also overlooked that ext4 had a BUG() for the case they're equal. Maybe we should keep that too if it's not overkill. --b. commit ad9a94b0e91d6057734e9835782e0c2cdc148bdc Author: J. Bruce Fields <bfields@xxxxxxxxxx> Date: Wed Apr 18 15:16:33 2012 -0400 vfs: pull ext4's double-i_mutex-locking into common code We want to do this elsewhere as well. Also catch any attempts to use it for directories (where this ordering would conflict with ancestor-first directory ordering in lock_rename). Cc: Andreas Dilger <adilger.kernel@xxxxxxxxx> Cc: Dave Chinner <david@xxxxxxxxxxxxx> Acked-by: Jeff Layton <jlayton@xxxxxxxxxx> Acked-by: "Theodore Ts'o" <tytso@xxxxxxx> Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 5aae3d1..3590abe 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2642,8 +2642,6 @@ extern void ext4_double_down_write_data_sem(struct inode *first, struct inode *second); extern void ext4_double_up_write_data_sem(struct inode *orig_inode, struct inode *donor_inode); -void ext4_inode_double_lock(struct inode *inode1, struct inode *inode2); -void ext4_inode_double_unlock(struct inode *inode1, struct inode *inode2); extern int ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 start_orig, __u64 start_donor, __u64 len, __u64 *moved_len); diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index 9491ac0..12048f7 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -129,7 +129,7 @@ static long swap_inode_boot_loader(struct super_block *sb, /* Protect orig inodes against a truncate and make sure, * that only 1 swap_inode_boot_loader is running. */ - ext4_inode_double_lock(inode, inode_bl); + lock_two_nondirectories(inode, inode_bl); truncate_inode_pages(&inode->i_data, 0); truncate_inode_pages(&inode_bl->i_data, 0); @@ -204,7 +204,7 @@ static long swap_inode_boot_loader(struct super_block *sb, ext4_inode_resume_unlocked_dio(inode); ext4_inode_resume_unlocked_dio(inode_bl); - ext4_inode_double_unlock(inode, inode_bl); + unlock_two_nondirectories(inode, inode_bl); iput(inode_bl); diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c index 3dcbf36..986a838 100644 --- a/fs/ext4/move_extent.c +++ b/fs/ext4/move_extent.c @@ -1206,42 +1206,6 @@ mext_check_arguments(struct inode *orig_inode, } /** - * ext4_inode_double_lock - Lock i_mutex on both @inode1 and @inode2 - * - * @inode1: the inode structure - * @inode2: the inode structure - * - * Lock two inodes' i_mutex - */ -void -ext4_inode_double_lock(struct inode *inode1, struct inode *inode2) -{ - BUG_ON(inode1 == inode2); - if (inode1 < inode2) { - mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT); - mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD); - } else { - mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT); - mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD); - } -} - -/** - * ext4_inode_double_unlock - Release i_mutex on both @inode1 and @inode2 - * - * @inode1: the inode that is released first - * @inode2: the inode that is released second - * - */ - -void -ext4_inode_double_unlock(struct inode *inode1, struct inode *inode2) -{ - mutex_unlock(&inode1->i_mutex); - mutex_unlock(&inode2->i_mutex); -} - -/** * ext4_move_extents - Exchange the specified range of a file * * @o_filp: file structure of the original file @@ -1330,7 +1294,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, return -EINVAL; } /* Protect orig and donor inodes against a truncate */ - ext4_inode_double_lock(orig_inode, donor_inode); + lock_two_nondirectories(orig_inode, donor_inode); /* Wait for all existing dio workers */ ext4_inode_block_unlocked_dio(orig_inode); @@ -1538,7 +1502,7 @@ out: ext4_double_up_write_data_sem(orig_inode, donor_inode); ext4_inode_resume_unlocked_dio(orig_inode); ext4_inode_resume_unlocked_dio(donor_inode); - ext4_inode_double_unlock(orig_inode, donor_inode); + unlock_two_nondirectories(orig_inode, donor_inode); return ret; } diff --git a/fs/inode.c b/fs/inode.c index 00d5fc3..8f3c6fa 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -980,6 +980,37 @@ void unlock_new_inode(struct inode *inode) EXPORT_SYMBOL(unlock_new_inode); /** + * lock_two_nondirectories - take two i_mutexes on non-directory objects + * @inode1: first inode to lock + * @inode2: second inode to lock + */ +void lock_two_nondirectories(struct inode *inode1, struct inode *inode2) +{ + WARN_ON_ONCE(S_ISDIR(inode1->i_mode) || S_ISDIR(inode2->i_mode)); + WARN_ON_ONCE(inode1 == inode2); + if (inode1 < inode2) { + mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT); + mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD); + } else { + mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT); + mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD); + } +} +EXPORT_SYMBOL(lock_two_nondirectories); + +/** + * unlock_two_nondirectories - release locks from lock_two_nondirectories() + * @inode1: first inode to unlock + * @inode2: second inode to unlock + */ +void unlock_two_nondirectories(struct inode *inode1, struct inode *inode2) +{ + mutex_unlock(&inode1->i_mutex); + mutex_unlock(&inode2->i_mutex); +} +EXPORT_SYMBOL(unlock_two_nondirectories); + +/** * iget5_locked - obtain an inode from a mounted file system * @sb: super block of file system * @hashval: hash value (usually inode number) to get diff --git a/include/linux/fs.h b/include/linux/fs.h index 65c2be2..3258761 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -634,6 +634,9 @@ enum inode_i_mutex_lock_class I_MUTEX_QUOTA }; +void lock_two_nondirectories(struct inode *, struct inode*); +void unlock_two_nondirectories(struct inode *, struct inode*); + /* * NOTE: in a 32bit arch with a preemptable kernel and * an UP compile the i_size_read/write must be atomic -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html