Re: [PATCH v3 3/4] iomap: elide flush from partial eof zero range

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 08, 2024 at 07:42:45AM -0500, Brian Foster wrote:
> iomap zero range flushes pagecache in certain situations to
> determine which parts of the range might require zeroing if dirty
> data is present in pagecache. The kernel robot recently reported a
> regression associated with this flushing in the following stress-ng
> workload on XFS:
> 
> stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --metamix 64
> 
> This workload involves repeated small, strided, extending writes. On
> XFS, this produces a pattern of post-eof speculative preallocation,
> conversion of preallocation from delalloc to unwritten, dirtying
> pagecache over newly unwritten blocks, and then rinse and repeat
> from the new EOF. This leads to repetitive flushing of the EOF folio
> via the zero range call XFS uses for writes that start beyond
> current EOF.
> 
> To mitigate this problem, special case EOF block zeroing to prefer
> zeroing the folio over a flush when the EOF folio is already dirty.
> To do this, split out and open code handling of an unaligned start
> offset. This brings most of the performance back by avoiding flushes
> on zero range calls via write and truncate extension operations. The
> flush doesn't occur in these situations because the entire range is
> post-eof and therefore the folio that overlaps EOF is the only one
> in the range.
> 
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> ---
>  fs/iomap/buffered-io.c | 28 ++++++++++++++++++++++++----
>  1 file changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index a78b5b9b3df3..7f40234a301e 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1401,6 +1401,10 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
>  		.len		= len,
>  		.flags		= IOMAP_ZERO,
>  	};
> +	struct address_space *mapping = inode->i_mapping;
> +	unsigned int blocksize = i_blocksize(inode);
> +	unsigned int off = pos & (blocksize - 1);
> +	loff_t plen = min_t(loff_t, len, blocksize - off);
>  	int ret;
>  	bool range_dirty;
>  
> @@ -1410,12 +1414,28 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
>  	 * mapping converts on writeback completion and so must be zeroed.
>  	 *
>  	 * The simplest way to deal with this across a range is to flush
> -	 * pagecache and process the updated mappings. To avoid an unconditional
> -	 * flush, check pagecache state and only flush if dirty and the fs
> -	 * returns a mapping that might convert on writeback.
> +	 * pagecache and process the updated mappings. To avoid excessive
> +	 * flushing on partial eof zeroing, special case it to zero the
> +	 * unaligned start portion if already dirty in pagecache.
> +	 */
> +	if (off &&
> +	    filemap_range_needs_writeback(mapping, pos, pos + plen - 1)) {
> +		iter.len = plen;
> +		while ((ret = iomap_iter(&iter, ops)) > 0)
> +			iter.processed = iomap_zero_iter(&iter, did_zero);
> +
> +		iter.len = len - (iter.pos - pos);
> +		if (ret || !iter.len)
> +			return ret;

This looks much cleaner to me now, thanks for iterating :)
Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx>

--D

> +	}
> +
> +	/*
> +	 * To avoid an unconditional flush, check pagecache state and only flush
> +	 * if dirty and the fs returns a mapping that might convert on
> +	 * writeback.
>  	 */
>  	range_dirty = filemap_range_needs_writeback(inode->i_mapping,
> -					pos, pos + len - 1);
> +					iter.pos, iter.pos + iter.len - 1);
>  	while ((ret = iomap_iter(&iter, ops)) > 0) {
>  		const struct iomap *s = iomap_iter_srcmap(&iter);
>  
> -- 
> 2.47.0
> 
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux