Re: [PATCH] ext4: Check io list state and avoid an unnecessary mutex_lock in ext4_end_io_work.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 14, 2011 at 04:33:02PM +0800, Tao Ma wrote:
> From: Tao Ma <boyu.mt@xxxxxxxxxx>
> 
> When we finish the end io work in ext4_flush_completed_IO, we take
> the io work away from the list, but don't free it. Then in the workqueue,
> we can check the list state and then avoid the extra work if it is also
> done. It is good, but we check list state in ext4_end_io_nolock with i_mutex held
> instead of the spin_lock in other places. This is wrong.

Hi Tao,

Thanks for finding and pointing out this bug in ext4_end_io_nolock().
Unfortunately your proposed fix doesn't take into account that there
are other callers of ext4_end_io_nolock() besides ext4_end_io_work(),
and so it's not sufficient to move the test from former function to
the latter.

Attached please find the patch which I am planning to check into the
ext4 tree to address this bug which you have pointed out.

Regards,

						- Ted

commit c0e36d8410bfad4db4edefeb4175f85a5d216c8d
Author: Theodore Ts'o <tytso@xxxxxxx>
Date:   Sat Oct 29 16:57:19 2011 -0400

    ext4: use spinlock before checking io->list in ext4_io_end_nolock()
    
    In ext4_end_io_nolock(), io->list is checked to see if it is empty
    without taking the ei->i_completed_io_lock spinlock.  This violates
    the locking protocol, and can cause very hard to debug failures.
    
    Also optimize ext4_end_io_work() so that if ext4_end_io_nolock() is
    not going to do any work, don't try getting the i_mutex and possibly
    requeuing the end_io request if the trylock doesn't succeed in grabbing
    the mutex.
    
    Thanks to Tao Ma <boyu.mt@xxxxxxxxxx> for spotting the error and
    providing an initial fix to address the problem.
    
    Signed-off-by: "Theodore Ts'o" <tytso@xxxxxxx>
    Signed-off-by: Tao Ma <boyu.mt@xxxxxxxxxx>

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 92f38ee..0af5607 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -90,21 +90,27 @@ void ext4_free_io_end(ext4_io_end_t *io)
  */
 int ext4_end_io_nolock(ext4_io_end_t *io)
 {
-	struct inode *inode = io->inode;
-	loff_t offset = io->offset;
-	ssize_t size = io->size;
-	wait_queue_head_t *wq;
-	int ret = 0;
+	unsigned long		flags;
+	struct inode		*inode = io->inode;
+	loff_t			offset = io->offset;
+	ssize_t			size = io->size;
+	struct ext4_inode_info	*ei = EXT4_I(inode);
+	wait_queue_head_t	*wq;
+	int			ret;
 
 	ext4_debug("ext4_end_io_nolock: io 0x%p from inode %lu,list->next 0x%p,"
 		   "list->prev 0x%p\n",
 		   io, inode->i_ino, io->list.next, io->list.prev);
 
-	if (list_empty(&io->list))
-		return ret;
+	spin_lock_irqsave(&ei->i_completed_io_lock, flags);
+	if (list_empty(&io->list)) {
+		spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
+		return 0;
+	}
+	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
 
 	if (!(io->flag & EXT4_IO_END_UNWRITTEN))
-		return ret;
+		return 0;
 
 	ret = ext4_convert_unwritten_extents(inode, offset, size);
 	if (ret < 0) {
@@ -142,6 +148,16 @@ static void ext4_end_io_work(struct work_struct *work)
 	unsigned long		flags;
 	int			ret;
 
+	if (!(io->flag & EXT4_IO_END_UNWRITTEN))
+		goto free_io_end;
+
+	spin_lock_irqsave(&ei->i_completed_io_lock, flags);
+	if (list_empty(&io->list)) {
+		spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
+		goto free_io_end;
+	}
+	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
+
 	if (!mutex_trylock(&inode->i_mutex)) {
 		/*
 		 * Requeue the work instead of waiting so that the work
@@ -170,6 +186,7 @@ static void ext4_end_io_work(struct work_struct *work)
 		list_del_init(&io->list);
 	spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
 	mutex_unlock(&inode->i_mutex);
+free_io_end:
 	ext4_free_io_end(io);
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux