Re: [PATCH] ext4: Fix i_disksize exceeding i_size problem in paritally written case

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri 17-03-23 09:35:53, Zhihao Cheng wrote:
> Following process makes i_disksize exceed i_size:
> 
> generic_perform_write
>  copied = iov_iter_copy_from_user_atomic(len) // copied < len
>  ext4_da_write_end
>  | ext4_update_i_disksize
>  |  new_i_size = pos + copied;
>  |  WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize) // update i_disksize
>  | generic_write_end
>  |  copied = block_write_end(copied, len) // copied = 0
>  |   if (unlikely(copied < len))
>  |    if (!PageUptodate(page))
>  |     copied = 0;
>  |  if (pos + copied > inode->i_size) // return false
>  if (unlikely(copied == 0))
>   goto again;
>  if (unlikely(iov_iter_fault_in_readable(i, bytes))) {
>   status = -EFAULT;
>   break;
>  }
> 
> We get i_disksize greater than i_size here, which could trigger WARNING
> check 'i_size_read(inode) < EXT4_I(inode)->i_disksize' while doing dio:
> 
> ext4_dio_write_iter
>  iomap_dio_rw
>   __iomap_dio_rw // return err, length is not aligned to 512
>  ext4_handle_inode_extension
>   WARN_ON_ONCE(i_size_read(inode) < EXT4_I(inode)->i_disksize) // Oops
> 
>  WARNING: CPU: 2 PID: 2609 at fs/ext4/file.c:319
>  CPU: 2 PID: 2609 Comm: aa Not tainted 6.3.0-rc2
>  RIP: 0010:ext4_file_write_iter+0xbc7
>  Call Trace:
>   vfs_write+0x3b1
>   ksys_write+0x77
>   do_syscall_64+0x39
> 
> Fix it by putting block_write_end() before i_disksize updating just
> like ext4_write_end() does.
> 
> Fetch a reproducer in [Link].
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217209
> Fixes: 64769240bd07f ("ext4: Add delayed allocation support in data=writeback mode")
> Signed-off-by: Zhihao Cheng <chengzhihao1@xxxxxxxxxx>

Good catch (although practically this will hardly have any negative
effect). But rather than opencoding generic_write_end() I'd do:

        if (unlikely(copied < len) && !PageUptodate(page))
                copied = 0;

at the beginning of ext4_da_write_end() and that should solve these
problems as well?

								Honza

> ---
>  fs/ext4/inode.c | 32 +++++++++++++++++++++++++-------
>  1 file changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index bf0b7dea4900..577dc23f3b78 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3136,6 +3136,8 @@ static int ext4_da_write_end(struct file *file,
>  	loff_t new_i_size;
>  	unsigned long start, end;
>  	int write_mode = (int)(unsigned long)fsdata;
> +	bool i_size_changed = false;
> +	loff_t old_size = inode->i_size;
>  
>  	if (write_mode == FALL_BACK_TO_NONDELALLOC)
>  		return ext4_write_end(file, mapping, pos,
> @@ -3148,6 +3150,8 @@ static int ext4_da_write_end(struct file *file,
>  	    ext4_has_inline_data(inode))
>  		return ext4_write_inline_data_end(inode, pos, len, copied, page);
>  
> +	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
> +
>  	start = pos & (PAGE_SIZE - 1);
>  	end = start + copied - 1;
>  
> @@ -3162,16 +3166,30 @@ static int ext4_da_write_end(struct file *file,
>  	 * check), we need to update i_disksize here as neither
>  	 * ext4_writepage() nor certain ext4_writepages() paths not
>  	 * allocating blocks update i_disksize.
> -	 *
> -	 * Note that we defer inode dirtying to generic_write_end() /
> -	 * ext4_da_write_inline_data_end().
>  	 */
>  	new_i_size = pos + copied;
> -	if (copied && new_i_size > inode->i_size &&
> -	    ext4_da_should_update_i_disksize(page, end))
> -		ext4_update_i_disksize(inode, new_i_size);
> +	if (new_i_size > inode->i_size) {
> +		i_size_write(inode, new_i_size);
> +		i_size_changed = true;
> +		if (copied && ext4_da_should_update_i_disksize(page, end))
> +			ext4_update_i_disksize(inode, new_i_size);
> +	}
> +
> +	unlock_page(page);
> +	put_page(page);
> +
> +	if (old_size < pos)
> +		pagecache_isize_extended(inode, old_size, pos);
> +	/*
> +	 * Don't mark the inode dirty under page lock. First, it unnecessarily
> +	 * makes the holding time of page lock longer. Second, it forces lock
> +	 * ordering of page lock and transaction start for journaling
> +	 * filesystems.
> +	 */
> +	if (i_size_changed)
> +		mark_inode_dirty(inode);
>  
> -	return generic_write_end(file, mapping, pos, len, copied, page, fsdata);
> +	return copied;
>  }
>  
>  /*
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux