Re: [PATCH v4 03/10] ext4: don't write back data before punch hole in nojournal mode

Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> · Tue, 17 Dec 2024 20:20:27 +0530



On Tue, Dec 17, 2024 at 08:01:26PM +0530, Ojaswin Mujoo wrote:
> On Mon, Dec 16, 2024 at 09:39:08AM +0800, Zhang Yi wrote:
> > From: Zhang Yi <yi.zhang@xxxxxxxxxx>
> > 
> > There is no need to write back all data before punching a hole in
> > non-journaled mode since it will be dropped soon after removing space.
> > Therefore, the call to filemap_write_and_wait_range() can be eliminated.
> 
> Hi, sorry I'm a bit late to this however following the discussion here
> [1], I believe the initial concern was that we don't in PATCH v1 01/10 
> was that after truncating the pagecache, the ext4_alloc_file_blocks()
> call might fail with errors like EIO, ENOMEM etc leading to inconsistent
> data. 
> 
> Is my understanding correct that  we realised that these are very rare
> cases and are not worth the performance penalty of writeback? In which
> case, is it really okay to just let the scope for corruption exist even
> though its rare. There might be some other error cases we might be
> missing which might be more easier to hit. For eg I think we can also
> fail ext4_alloc_file_blocks() with ENOSPC in case there is a written to
> unwritten extent conversion causing an extent split leading to  extent
> tree node allocation. (Maybe can be avoided by using PRE_IO with
> EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT in the first ext4_alloc_file_blocks() call)
> 
> So does it make sense to retain the writeback behavior or am I just
> being paranoid :) 
> 
> Regards,
> ojaswin

[1]
https://lore.kernel.org/linux-ext4/20240917165007.j5dywaekvnirfffm@quack3/
> 
> > Besides, similar to ext4_zero_range(), we must address the case of
> > partially punched folios when block size < page size. It is essential to
> > remove writable userspace mappings to ensure that the folio can be
> > faulted again during subsequent mmap write access.
> > 
> > In journaled mode, we need to write dirty pages out before discarding
> > page cache in case of crash before committing the freeing data
> > transaction, which could expose old, stale data, even if synchronization
> > has been performed.
> > 
> > Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
> > ---
> >  fs/ext4/inode.c | 18 +++++-------------
> >  1 file changed, 5 insertions(+), 13 deletions(-)
> > 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index bf735d06b621..a5ba2b71d508 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -4018,17 +4018,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
> >  
> >  	trace_ext4_punch_hole(inode, offset, length, 0);
> >  
> > -	/*
> > -	 * Write out all dirty pages to avoid race conditions
> > -	 * Then release them.
> > -	 */
> > -	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> > -		ret = filemap_write_and_wait_range(mapping, offset,
> > -						   offset + length - 1);
> > -		if (ret)
> > -			return ret;
> > -	}
> > -
> >  	inode_lock(inode);
> >  
> >  	/* No need to punch hole beyond i_size */
> > @@ -4090,8 +4079,11 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
> >  		ret = ext4_update_disksize_before_punch(inode, offset, length);
> >  		if (ret)
> >  			goto out_dio;
> > -		truncate_pagecache_range(inode, first_block_offset,
> > -					 last_block_offset);
> > +
> > +		ret = ext4_truncate_page_cache_block_range(inode,
> > +				first_block_offset, last_block_offset + 1);
> > +		if (ret)
> > +			goto out_dio;
> >  	}
> >  
> >  	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
> > -- 
> > 2.46.1
> >