On Wed, Dec 18, 2024 at 03:10:36PM +0800, Zhang Yi wrote: > On 2024/12/17 22:31, Ojaswin Mujoo wrote: > > On Mon, Dec 16, 2024 at 09:39:08AM +0800, Zhang Yi wrote: > >> From: Zhang Yi <yi.zhang@xxxxxxxxxx> > >> > >> There is no need to write back all data before punching a hole in > >> non-journaled mode since it will be dropped soon after removing space. > >> Therefore, the call to filemap_write_and_wait_range() can be eliminated. > > > > Hi, sorry I'm a bit late to this however following the discussion here > > [1], I believe the initial concern was that we don't in PATCH v1 01/10 > > was that after truncating the pagecache, the ext4_alloc_file_blocks() > > call might fail with errors like EIO, ENOMEM etc leading to inconsistent > > data. > > > > Is my understanding correct that we realised that these are very rare > > cases and are not worth the performance penalty of writeback? In which > > case, is it really okay to just let the scope for corruption exist even > > though its rare. There might be some other error cases we might be > > missing which might be more easier to hit. For eg I think we can also > > fail ext4_alloc_file_blocks() with ENOSPC in case there is a written to > > unwritten extent conversion causing an extent split leading to extent > > tree node allocation. (Maybe can be avoided by using PRE_IO with > > EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT in the first ext4_alloc_file_blocks() call) > > > > So does it make sense to retain the writeback behavior or am I just > > being paranoid :) > > > > Hi, Ojaswin! > > Yeah, from my point of view, ENOSPC could happen, and it may be more > likely to happen if we intentionally create conditions for it. However, > all the efforts we can make at this point are merely best efforts and > reduce the probability. We cannot 100% guarantee it will not happen, > even if we write back the whole range before manipulating extents and > blocks. This is because we do not accurately reserve space for extents > split. Additionally, In ext4_punch_hole(), we have used 'nofail' flag Right, rechecking the ext4_map_blocks code, seems like we can also result in a failure after unwrit extents have successfully been allocated so either ways we can't be sure that we'll retain old data on failure even with writeback. > while freeing blocks to reduce the possibility of ENOSPC. So I suppose > it's fine by now, but we may need to implement additional measures if > we truly want to resolve the issue completely. Sure I agree that in that case we should ideally have something more robust to handle these edge cases. For now, this change looks good. Feel free to add: Reviewed-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> > > Thanks, > Yi. > > > > >> Besides, similar to ext4_zero_range(), we must address the case of > >> partially punched folios when block size < page size. It is essential to > >> remove writable userspace mappings to ensure that the folio can be > >> faulted again during subsequent mmap write access. > >> > >> In journaled mode, we need to write dirty pages out before discarding > >> page cache in case of crash before committing the freeing data > >> transaction, which could expose old, stale data, even if synchronization > >> has been performed. > >> > >> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx> > >> --- > >> fs/ext4/inode.c | 18 +++++------------- > >> 1 file changed, 5 insertions(+), 13 deletions(-) > >> > >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > >> index bf735d06b621..a5ba2b71d508 100644 > >> --- a/fs/ext4/inode.c > >> +++ b/fs/ext4/inode.c > >> @@ -4018,17 +4018,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) > >> > >> trace_ext4_punch_hole(inode, offset, length, 0); > >> > >> - /* > >> - * Write out all dirty pages to avoid race conditions > >> - * Then release them. > >> - */ > >> - if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) { > >> - ret = filemap_write_and_wait_range(mapping, offset, > >> - offset + length - 1); > >> - if (ret) > >> - return ret; > >> - } > >> - > >> inode_lock(inode); > >> > >> /* No need to punch hole beyond i_size */ > >> @@ -4090,8 +4079,11 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) > >> ret = ext4_update_disksize_before_punch(inode, offset, length); > >> if (ret) > >> goto out_dio; > >> - truncate_pagecache_range(inode, first_block_offset, > >> - last_block_offset); > >> + > >> + ret = ext4_truncate_page_cache_block_range(inode, > >> + first_block_offset, last_block_offset + 1); > >> + if (ret) > >> + goto out_dio; > >> } > >> > >> if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) > >> -- > >> 2.46.1 > >> >