Re: [PATCH] iomap: elide zero range flush from partial eof zeroing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 23, 2024 at 10:30:29AM -0400, Brian Foster wrote:
> iomap zero range performs a pagecache flush upon seeing unwritten
> extents with dirty pagecache in order to determine accurate
> subranges that require direct zeroing. This is to support an
> optimization where clean, unwritten ranges are skipped as they are
> already zero on-disk.
> 
> Certain use cases for zero range are more sensitive to flush latency
> than others. The kernel test robot recently reported a regression in
> the following stress-ng workload on XFS:
> 
>   stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --metamix 64
> 
> This workload involves a series of small, strided, write extending
> writes. On XFS, this produces a pattern of allocating post-eof
> speculative preallocation, converting preallocation to unwritten on
> zero range calls, dirtying pagecache over the converted mapping, and
> then repeating the sequence again from the updated EOF. This
> basically produces a sequence of pagecache flushes on the partial
> EOF block zeroing use case of zero range.
> 
> To mitigate this problem, special case the EOF block zeroing use
> case to prefer zeroing over a pagecache flush when the EOF folio is
> already dirty. This brings most of the performance back by avoiding
> flushes on write and truncate extension operations, while preserving
> the ability for iomap to flush and properly process larger ranges.
> 
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> ---
> 
> Hi iomap maintainers,
> 
> This is an incremental optimization for the regression reported by the
> test robot here[1]. I'm not totally convinced this is necessary as an
> immediate fix, but the discussion on that thread was enough to suggest
> it could be. I don't really love the factoring, but I had to play a bit
> of whack-a-mole between fstests and stress-ng to restore performance and
> still maintain behavior expectations for some of the tests.
> 
> On a positive note, exploring this gave me what I think is a better idea
> for dealing with zero range overall, so I'm working on a followup to
> this that reworks it by splitting zero range across block alignment
> boundaries (similar to how something like truncate page range works, for
> example). This simplifies things by isolating the dirty range check to a
> single folio on an unaligned start offset, which lets the _iter() call
> do a skip or zero (i.e. no more flush_and_stale()), and then
> unconditionally flush the aligned portion to end-of-range. The latter
> flush should be a no-op for every use case I've seen so far, so this
> might entirely avoid the need for anything more complex for zero range.
> 
> In summary, I'm posting this as an optional and more "stable-worthy"
> patch for reference and for the maintainers to consider as they like. I
> think it's reasonable to include if we are concerned about this
> particular stress-ng test and are Ok with it as a transient solution.
> But if it were up to me, I'd probably sit on it for a bit to determine
> if a more practical user/workload is affected by this, particularly
> knowing that I'm trying to rework it. This could always be applied as a
> stable fix if really needed, but I just don't think the slightly more
> invasive rework is appropriate for -rc..
> 
> Thoughts, reviews, flames appreciated.
> 
> Brian
> 
> [1] https://lore.kernel.org/linux-xfs/202410141536.1167190b-oliver.sang@xxxxxxxxx/
> 
>  fs/iomap/buffered-io.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index aa587b2142e2..8fd25b14d120 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1372,6 +1372,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
>  	loff_t pos = iter->pos;
>  	loff_t length = iomap_length(iter);
>  	loff_t written = 0;
> +	bool eof_zero = false;
>  
>  	/*
>  	 * We must zero subranges of unwritten mappings that might be dirty in
> @@ -1391,12 +1392,23 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
>  	 * triggers writeback time post-eof zeroing.
>  	 */
>  	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) {
> -		if (*range_dirty) {
> +		/* range is clean and already zeroed, nothing to do */
> +		if (!*range_dirty)
> +			return length;
> +
> +		/* flush for anything other than partial eof zeroing */
> +		if (pos != i_size_read(iter->inode) ||
> +		   (pos % i_blocksize(iter->inode)) == 0) {
>  			*range_dirty = false;
>  			return iomap_zero_iter_flush_and_stale(iter);
>  		}
> -		/* range is clean and already zeroed, nothing to do */
> -		return length;
> +		/*
> +		 * Special case partial EOF zeroing. Since we know the EOF
> +		 * folio is dirty, prefer in-memory zeroing for it. This avoids
> +		 * excessive flush latency on frequent file size extending
> +		 * operations.
> +		 */
> +		eof_zero = true;
>  	}
>  
>  	do {
> @@ -1415,6 +1427,8 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
>  		offset = offset_in_folio(folio, pos);
>  		if (bytes > folio_size(folio) - offset)
>  			bytes = folio_size(folio) - offset;
> +		if (eof_zero && length > bytes)
> +			length = bytes;

What does this do?  I think this causes the loop to break after putting
the folio that caches @pos?  And then I guess we go around the loop in
iomap_zero_range again if there were more bytes to zero after this
folio?

--D

>  
>  		folio_zero_range(folio, offset, bytes);
>  		folio_mark_accessed(folio);
> -- 
> 2.46.2
> 
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux