Re: [PATCH v2 2/2] ext4: update i_disksize if direct write past ondisk size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 12-03-18 23:21:56, Eryu Guan wrote:
> Currently in ext4 direct write path, we update i_disksize only when
> new eof is greater than i_size, and don't update it even when new
> eof is greater than i_disksize but less than i_size. This doesn't
> work well with delalloc buffer write, which updates i_size and
> i_disksize only when delalloc blocks are resolved (at writeback
> time), the i_disksize from direct write can be lost if a previous
> buffer write succeeded at write time but failed at writeback time,
> then results in corrupted ondisk inode size.
> 
> Consider this case, first buffer write 4k data to a new file at
> offset 16k with delayed allocation, then direct write 4k data to the
> same file at offset 4k before delalloc blocks are resolved, which
> doesn't update i_disksize because it writes within i_size(20k), but
> the extent tree metadata has been committed in journal. Then
> writeback of the delalloc blocks fails (due to device error etc.),
> and i_size/i_disksize from buffer write can't be written to disk
> (still zero). A subsequent umount/mount cycle recovers journal and
> writes extent tree metadata from direct write to disk, but with
> i_disksize being zero.
> 
> Fix it by updating i_disksize too in direct write path when new eof
> is greater than i_disksize but less than i_size, so i_disksize is
> always consistent with direct write.
> 
> This fixes occasional i_size corruption in fstests generic/475.
> 
> Signed-off-by: Eryu Guan <guaneryu@xxxxxxxxx>

Thanks for fixing this. The patch looks good. You can add:

Reviewed-by: Jan Kara <jack@xxxxxxx>

								Honza

> ---
> 
> v2:
> - basically no change since v1, just fix the locking issue first, and
>   reintroduce the "ei" definition in this patch.
> 
> I've tested this patchset by looping generic/475 200 times without
> hitting a corruption, usually it fails within 5 iterations for me. Also
> tested by full fstests runs on ext2_4k, ext3_2k, ext4_1k configurations
> and dio tests from LTP, all results looked good.
> 
>  fs/ext4/inode.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index bff44b4a0783..9acac476c15c 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3658,6 +3658,7 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter)
>  {
>  	struct file *file = iocb->ki_filp;
>  	struct inode *inode = file->f_mapping->host;
> +	struct ext4_inode_info *ei = EXT4_I(inode);
>  	ssize_t ret;
>  	loff_t offset = iocb->ki_pos;
>  	size_t count = iov_iter_count(iter);
> @@ -3668,7 +3669,7 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter)
>  	int orphan = 0;
>  	handle_t *handle;
>  
> -	if (final_size > inode->i_size) {
> +	if (final_size > inode->i_size || final_size > ei->i_disksize) {
>  		/* Credits for sb + inode write */
>  		handle = ext4_journal_start(inode, EXT4_HT_INODE, 2);
>  		if (IS_ERR(handle)) {
> @@ -3788,9 +3789,10 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter)
>  			ext4_orphan_del(handle, inode);
>  		if (ret > 0) {
>  			loff_t end = offset + ret;
> -			if (end > inode->i_size) {
> +			if (end > inode->i_size || end > ei->i_disksize) {
>  				ext4_update_i_disksize(inode, end);
> -				i_size_write(inode, end);
> +				if (end > inode->i_size)
> +					i_size_write(inode, end);
>  				/*
>  				 * We're going to return a positive `ret'
>  				 * here due to non-zero-length I/O, so there's
> -- 
> 2.14.3
> 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux