On Tue, Dec 17, 2024 at 08:01:26PM +0530, Ojaswin Mujoo wrote: > On Mon, Dec 16, 2024 at 09:39:08AM +0800, Zhang Yi wrote: > > From: Zhang Yi <yi.zhang@xxxxxxxxxx> > > > > There is no need to write back all data before punching a hole in > > non-journaled mode since it will be dropped soon after removing space. > > Therefore, the call to filemap_write_and_wait_range() can be eliminated. > > Hi, sorry I'm a bit late to this however following the discussion here > [1], I believe the initial concern was that we don't in PATCH v1 01/10 > was that after truncating the pagecache, the ext4_alloc_file_blocks() > call might fail with errors like EIO, ENOMEM etc leading to inconsistent > data. > > Is my understanding correct that we realised that these are very rare > cases and are not worth the performance penalty of writeback? In which > case, is it really okay to just let the scope for corruption exist even > though its rare. There might be some other error cases we might be > missing which might be more easier to hit. For eg I think we can also > fail ext4_alloc_file_blocks() with ENOSPC in case there is a written to > unwritten extent conversion causing an extent split leading to extent > tree node allocation. (Maybe can be avoided by using PRE_IO with > EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT in the first ext4_alloc_file_blocks() call) > > So does it make sense to retain the writeback behavior or am I just > being paranoid :) > > Regards, > ojaswin [1] https://lore.kernel.org/linux-ext4/20240917165007.j5dywaekvnirfffm@quack3/ > > > Besides, similar to ext4_zero_range(), we must address the case of > > partially punched folios when block size < page size. It is essential to > > remove writable userspace mappings to ensure that the folio can be > > faulted again during subsequent mmap write access. > > > > In journaled mode, we need to write dirty pages out before discarding > > page cache in case of crash before committing the freeing data > > transaction, which could expose old, stale data, even if synchronization > > has been performed. > > > > Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx> > > --- > > fs/ext4/inode.c | 18 +++++------------- > > 1 file changed, 5 insertions(+), 13 deletions(-) > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > index bf735d06b621..a5ba2b71d508 100644 > > --- a/fs/ext4/inode.c > > +++ b/fs/ext4/inode.c > > @@ -4018,17 +4018,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) > > > > trace_ext4_punch_hole(inode, offset, length, 0); > > > > - /* > > - * Write out all dirty pages to avoid race conditions > > - * Then release them. > > - */ > > - if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) { > > - ret = filemap_write_and_wait_range(mapping, offset, > > - offset + length - 1); > > - if (ret) > > - return ret; > > - } > > - > > inode_lock(inode); > > > > /* No need to punch hole beyond i_size */ > > @@ -4090,8 +4079,11 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) > > ret = ext4_update_disksize_before_punch(inode, offset, length); > > if (ret) > > goto out_dio; > > - truncate_pagecache_range(inode, first_block_offset, > > - last_block_offset); > > + > > + ret = ext4_truncate_page_cache_block_range(inode, > > + first_block_offset, last_block_offset + 1); > > + if (ret) > > + goto out_dio; > > } > > > > if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) > > -- > > 2.46.1 > >