Re: [PATCH 7/7] writeback: Avoid iput() from flusher thread

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 20, 2012 at 11:56:31PM +0100, Jan Kara wrote:
> Doing iput() from flusher thread (writeback_sb_inodes()) can create problems
> because iput() can do a lot of work - for example truncate the inode if it's
> the last iput on unlinked file. Some filesystems depend on flusher thread
> progressing (e.g. because they need to flush delay allocated blocks to reduce
> allocation uncertainty) and so flusher thread doing truncate creates
> interesting dependencies and possibilities for deadlocks.
> 
> We get rid of iput() in flusher thread by using the fact that I_SYNC inode
> flag effectively pins the inode in memory. So if we take care to either hold
> i_lock or have I_SYNC set, we can get away without taking inode reference
> in writeback_sb_inodes().
> 
> To make things work we have to move waiting for I_SYNC from end_writeback() to
> evict() just before calling of ->evict_inode. This is because several
> filesystems call end_writeback() after they have deleted the inode (btrfs,
> gfs2, ...) and other filesystems (ext3, ext4, reiserfs, ...) can deadlock when
> waiting for I_SYNC because they call end_writeback() from within a transaction.
> Both were not really a problem previously because flusher thread and
> ->evict_inode() could not run in parallel but now these two could race.
> So moving of I_SYNC wait prevents possible races..
> 
> As a side effect of these changes, we also fix possible use-after-free in
> wb_writeback() because inode_wait_for_writeback() call could try to reacquire
> i_lock on the inode that was already free.
.....

> diff --git a/fs/inode.c b/fs/inode.c
> index d3ebdbe..3869714 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -510,7 +510,6 @@ void end_writeback(struct inode *inode)
>  	BUG_ON(!list_empty(&inode->i_data.private_list));
>  	BUG_ON(!(inode->i_state & I_FREEING));
>  	BUG_ON(inode->i_state & I_CLEAR);
> -	inode_sync_wait(inode);
>  	/* don't need i_lock here, no concurrent mods to i_state */
>  	inode->i_state = I_FREEING | I_CLEAR;
>  }
> @@ -541,6 +540,18 @@ static void evict(struct inode *inode)
>  
>  	inode_sb_list_del(inode);
>  
> +	/*
> +	 * Wait for flusher thread to be done with the inode so that filesystem
> +	 * does not start destroying it while writeback is still running. Since
> +	 * the inode has I_FREEING set, flusher thread won't start new work on
> +	 * the inode.  We just have to wait for running writeback to finish. We
> +	 * must use i_lock here because flusher thread might be working with
> +	 * the inode without I_SYNC set but under i_lock.
> +	 */
> +	spin_lock(&inode->i_lock);
> +	inode_wait_for_writeback(inode);
> +	spin_unlock(&inode->i_lock);
> +

Why move this wait from end_writeback() to here?  The whole point
of end_writeback() is to provide a barrier that guarantees that
there is no async writeback running when it returns, so it seems
strange to move the barrier out of the function that is supposed to
provide the barrier....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux