Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 10, 2013 at 08:04:11AM +1000, Dave Chinner wrote:
> On Wed, Jul 03, 2013 at 04:12:25PM -0400, J. Bruce Fields wrote:
> > From: "J. Bruce Fields" <bfields@xxxxxxxxxx>
> > 
> > We want to do this elsewhere as well.
> > 
> > Cc: "Theodore Ts'o" <tytso@xxxxxxx>
> > Cc: Andreas Dilger <adilger.kernel@xxxxxxxxx>
> > Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx>
> > ---
> >  fs/ext4/ext4.h        |    2 --
> >  fs/ext4/ioctl.c       |    4 ++--
> >  fs/ext4/move_extent.c |   40 ++--------------------------------------
> >  fs/inode.c            |   29 +++++++++++++++++++++++++++++
> >  include/linux/fs.h    |    3 +++
> >  5 files changed, 36 insertions(+), 42 deletions(-)
> > 
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 5aae3d1..3590abe 100644

Thanks for the comment:

> Just to throw a spanner in the works - have you considered that
> other filesystems might have different inode lock ordering rules?
> 
> For example, XFS locks multiple inodes in ascending inode number
> order, not ordered by pointer address. Hence we end up different
> inode lock ordering at different layers of the stack and I can't see
> that ending well....

What lock(s) is it taking exactly, where?  If there's a possible
deadlock, can we come up with a compatible ordering?

> > diff --git a/fs/inode.c b/fs/inode.c
> > index 00d5fc3..b8afbc7 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -980,6 +980,35 @@ void unlock_new_inode(struct inode *inode)
> >  EXPORT_SYMBOL(unlock_new_inode);
> >  
> >  /**
> > + * lock_two_nondirectories - take two i_mutexes on non-directory objects
> > + * @inode1: first inode to lock
> > + * @inode2: second inode to lock
> > + */
> > +void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> > +{
> > +	if (inode1 < inode2) {
> > +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
> > +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
> > +	} else {
> > +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
> > +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
> > +	}
> > +}
> > +EXPORT_SYMBOL(lock_two_nondirectories);
> 
> What makes this specific to non-directories?

See 

	http://mid.gmane.org/<1372882356-14168-5-git-send-email-bfields@xxxxxxxxxx>

The only caller outside ext4 is vfs_rename_other.

I think we could make it work for directories two if necessary though
the ordering would be more complicated.  Currently there's no reason.

> If it's not to be used for directory inodes, then there should be
> WARN_ON_ONCE() guards in the code...

Sure.  So something like the following.

Hm.  I also overlooked that ext4 had a BUG() for the case they're equal.
Maybe we should keep that too if it's not overkill.

--b.

commit ad9a94b0e91d6057734e9835782e0c2cdc148bdc
Author: J. Bruce Fields <bfields@xxxxxxxxxx>
Date:   Wed Apr 18 15:16:33 2012 -0400

    vfs: pull ext4's double-i_mutex-locking into common code
    
    We want to do this elsewhere as well.
    
    Also catch any attempts to use it for directories (where this ordering
    would conflict with ancestor-first directory ordering in lock_rename).
    
    Cc: Andreas Dilger <adilger.kernel@xxxxxxxxx>
    Cc: Dave Chinner <david@xxxxxxxxxxxxx>
    Acked-by: Jeff Layton <jlayton@xxxxxxxxxx>
    Acked-by: "Theodore Ts'o" <tytso@xxxxxxx>
    Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx>

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 5aae3d1..3590abe 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2642,8 +2642,6 @@ extern void ext4_double_down_write_data_sem(struct inode *first,
 					    struct inode *second);
 extern void ext4_double_up_write_data_sem(struct inode *orig_inode,
 					  struct inode *donor_inode);
-void ext4_inode_double_lock(struct inode *inode1, struct inode *inode2);
-void ext4_inode_double_unlock(struct inode *inode1, struct inode *inode2);
 extern int ext4_move_extents(struct file *o_filp, struct file *d_filp,
 			     __u64 start_orig, __u64 start_donor,
 			     __u64 len, __u64 *moved_len);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 9491ac0..12048f7 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -129,7 +129,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
 
 	/* Protect orig inodes against a truncate and make sure,
 	 * that only 1 swap_inode_boot_loader is running. */
-	ext4_inode_double_lock(inode, inode_bl);
+	lock_two_nondirectories(inode, inode_bl);
 
 	truncate_inode_pages(&inode->i_data, 0);
 	truncate_inode_pages(&inode_bl->i_data, 0);
@@ -204,7 +204,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
 	ext4_inode_resume_unlocked_dio(inode);
 	ext4_inode_resume_unlocked_dio(inode_bl);
 
-	ext4_inode_double_unlock(inode, inode_bl);
+	unlock_two_nondirectories(inode, inode_bl);
 
 	iput(inode_bl);
 
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 3dcbf36..986a838 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -1206,42 +1206,6 @@ mext_check_arguments(struct inode *orig_inode,
 }
 
 /**
- * ext4_inode_double_lock - Lock i_mutex on both @inode1 and @inode2
- *
- * @inode1:	the inode structure
- * @inode2:	the inode structure
- *
- * Lock two inodes' i_mutex
- */
-void
-ext4_inode_double_lock(struct inode *inode1, struct inode *inode2)
-{
-	BUG_ON(inode1 == inode2);
-	if (inode1 < inode2) {
-		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
-	} else {
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
-		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
-	}
-}
-
-/**
- * ext4_inode_double_unlock - Release i_mutex on both @inode1 and @inode2
- *
- * @inode1:     the inode that is released first
- * @inode2:     the inode that is released second
- *
- */
-
-void
-ext4_inode_double_unlock(struct inode *inode1, struct inode *inode2)
-{
-	mutex_unlock(&inode1->i_mutex);
-	mutex_unlock(&inode2->i_mutex);
-}
-
-/**
  * ext4_move_extents - Exchange the specified range of a file
  *
  * @o_filp:		file structure of the original file
@@ -1330,7 +1294,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
 		return -EINVAL;
 	}
 	/* Protect orig and donor inodes against a truncate */
-	ext4_inode_double_lock(orig_inode, donor_inode);
+	lock_two_nondirectories(orig_inode, donor_inode);
 
 	/* Wait for all existing dio workers */
 	ext4_inode_block_unlocked_dio(orig_inode);
@@ -1538,7 +1502,7 @@ out:
 	ext4_double_up_write_data_sem(orig_inode, donor_inode);
 	ext4_inode_resume_unlocked_dio(orig_inode);
 	ext4_inode_resume_unlocked_dio(donor_inode);
-	ext4_inode_double_unlock(orig_inode, donor_inode);
+	unlock_two_nondirectories(orig_inode, donor_inode);
 
 	return ret;
 }
diff --git a/fs/inode.c b/fs/inode.c
index 00d5fc3..8f3c6fa 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -980,6 +980,37 @@ void unlock_new_inode(struct inode *inode)
 EXPORT_SYMBOL(unlock_new_inode);
 
 /**
+ * lock_two_nondirectories - take two i_mutexes on non-directory objects
+ * @inode1: first inode to lock
+ * @inode2: second inode to lock
+ */
+void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
+{
+	WARN_ON_ONCE(S_ISDIR(inode1->i_mode) || S_ISDIR(inode2->i_mode));
+	WARN_ON_ONCE(inode1 == inode2);
+	if (inode1 < inode2) {
+		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
+		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+	} else {
+		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
+		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
+	}
+}
+EXPORT_SYMBOL(lock_two_nondirectories);
+
+/**
+ * unlock_two_nondirectories - release locks from lock_two_nondirectories()
+ * @inode1: first inode to unlock
+ * @inode2: second inode to unlock
+ */
+void unlock_two_nondirectories(struct inode *inode1, struct inode *inode2)
+{
+	mutex_unlock(&inode1->i_mutex);
+	mutex_unlock(&inode2->i_mutex);
+}
+EXPORT_SYMBOL(unlock_two_nondirectories);
+
+/**
  * iget5_locked - obtain an inode from a mounted file system
  * @sb:		super block of file system
  * @hashval:	hash value (usually inode number) to get
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 65c2be2..3258761 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -634,6 +634,9 @@ enum inode_i_mutex_lock_class
 	I_MUTEX_QUOTA
 };
 
+void lock_two_nondirectories(struct inode *, struct inode*);
+void unlock_two_nondirectories(struct inode *, struct inode*);
+
 /*
  * NOTE: in a 32bit arch with a preemptable kernel and
  * an UP compile the i_size_read/write must be atomic
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux