Re: [PATCH 4/7] ext4: fsync should wait for DIO writers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun 09-09-12 21:27:11, Dmitry Monakhov wrote:
> fsync and punch_hole are the places where we have to wait for all
> existing writers (writeback, aio, dio), but currently we simply
> flush pended end_io request which is not sufficient.
  Why not? I guess you mean the fact that there can be DIO in flight for
which end_io() was not called so it is not queued in the queue? But that is
OK - we have not yet called aio_complete() for that IO so for userspace the
write has not happened yet. Thus there's no need to flush it to disk -
fsync() does not say anything about writes in progress while fsync is
called.

> Even more i_mutex is not holded while punch_hole which obviously
> result in dangerous data corruption due to write-after-free.
  Yes, that's a bug. I also noticed that but didn't get to fixing it (I'm
actually working on a more long term fix using range locking but that's
more of a research project so having somehow fixed at least the most
blatant locking problems is good).

								Honza

> 
> This patch performs following changes:
> 
> - Guard punch_hole with i_mutex
> - fsync and punch_hole now wait for all writers in flight
>   NOTE: XXX write-after-free race is still possible because
>   truncate_pagecache_range() is not completely reliable and where
>   is no easy way to stop writeback while punch_hole is in progress.
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@xxxxxxxxxx>
> ---
>  fs/ext4/extents.c |   10 ++++++++--
>  fs/ext4/fsync.c   |    1 +
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index e993879..8252651 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4845,6 +4845,7 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>  			return err;
>  	}
>  
> +	mutex_lock(&inode->i_mutex);
>  	/* Now release the pages */
>  	if (last_page_offset > first_page_offset) {
>  		truncate_pagecache_range(inode, first_page_offset,
> @@ -4852,12 +4853,15 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
>  	}
>  
>  	/* finish any pending end_io work */
> +	inode_dio_wait(inode);
>  	ext4_flush_completed_IO(inode);
>  
>  	credits = ext4_writepage_trans_blocks(inode);
>  	handle = ext4_journal_start(inode, credits);
> -	if (IS_ERR(handle))
> -		return PTR_ERR(handle);
> +	if (IS_ERR(handle)) {
> +		err = PTR_ERR(handle);
> +		goto out_mutex;
> +	}
>  
>  	err = ext4_orphan_add(handle, inode);
>  	if (err)
> @@ -4951,6 +4955,8 @@ out:
>  	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
>  	ext4_mark_inode_dirty(handle, inode);
>  	ext4_journal_stop(handle);
> +out_mutex:
> +	mutex_unlock(&inode->i_mutex);
>  	return err;
>  }
>  int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index 24f3719..290c5cf 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -204,6 +204,7 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  	if (inode->i_sb->s_flags & MS_RDONLY)
>  		goto out;
>  
> +	inode_dio_wait(inode);
>  	ret = ext4_flush_completed_IO(inode);
>  	if (ret < 0)
>  		goto out;
> -- 
> 1.7.7.6
> 
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux