Re: [PATCH v2 2/2] iomap: elide flush from partial eof zero range

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 31, 2024 at 10:04:48AM -0400, Brian Foster wrote:
> iomap zero range flushes pagecache in certain situations to
> determine which parts of the range might require zeroing if dirty
> data is present in pagecache. The kernel robot recently reported a
> regression associated with this flushing in the following stress-ng
> workload on XFS:
> 
> stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --metamix 64
> 
> This workload involves repeated small, strided, extending writes. On
> XFS, this produces a pattern of post-eof speculative preallocation,
> conversion of preallocation from delalloc to unwritten, dirtying
> pagecache over newly unwritten blocks, and then rinse and repeat
> from the new EOF. This leads to repetitive flushing of the EOF folio
> via the zero range call XFS uses for writes that start beyond
> current EOF.
> 
> To mitigate this problem, special case EOF block zeroing to prefer
> zeroing the folio over a flush when the EOF folio is already dirty.
> To do this, split out and open code handling of an unaligned start
> offset. This brings most of the performance back by avoiding flushes
> on zero range calls via write and truncate extension operations. The
> flush doesn't occur in these situations because the entire range is
> post-eof and therefore the folio that overlaps EOF is the only one
> in the range.
> 
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>

Cc: <stable@xxxxxxxxxxxxxxx> # v6.12-rc1
Fixes: 7d9b474ee4cc37 ("iomap: make zero range flush conditional on unwritten mappings")

perhaps?

> ---
>  fs/iomap/buffered-io.c | 42 ++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 38 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 60386cb7b9ef..343a2fa29bec 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -227,6 +227,18 @@ static void ifs_free(struct folio *folio)
>  	kfree(ifs);
>  }
>  
> +/* helper to reset an iter for reuse */
> +static inline void
> +iomap_iter_init(struct iomap_iter *iter, struct inode *inode, loff_t pos,
> +		loff_t len, unsigned flags)

Nit: maybe call this iomap_iter_reset() ?

Also I wonder if it's really safe to zero iomap_iter::private?
Won't doing that leave a minor logic bomb?

> +{
> +	memset(iter, 0, sizeof(*iter));
> +	iter->inode = inode;
> +	iter->pos = pos;
> +	iter->len = len;
> +	iter->flags = flags;
> +}
> +
>  /*
>   * Calculate the range inside the folio that we actually need to read.
>   */
> @@ -1416,6 +1428,10 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
>  		.len		= len,
>  		.flags		= IOMAP_ZERO,
>  	};
> +	struct address_space *mapping = inode->i_mapping;
> +	unsigned int blocksize = i_blocksize(inode);
> +	unsigned int off = pos & (blocksize - 1);
> +	loff_t plen = min_t(loff_t, len, blocksize - off);
>  	int ret;
>  	bool range_dirty;
>  
> @@ -1425,12 +1441,30 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
>  	 * mapping converts on writeback completion and must be zeroed.
>  	 *
>  	 * The simplest way to deal with this is to flush pagecache and process
> -	 * the updated mappings. To avoid an unconditional flush, check dirty
> -	 * state and defer the flush until a combination of dirty pagecache and
> -	 * at least one mapping that might convert on writeback is seen.
> +	 * the updated mappings. First, special case the partial eof zeroing
> +	 * use case since it is more performance sensitive. Zero the start of
> +	 * the range if unaligned and already dirty in pagecache.
> +	 */
> +	if (off &&
> +	    filemap_range_needs_writeback(mapping, pos, pos + plen - 1)) {
> +		iter.len = plen;
> +		while ((ret = iomap_iter(&iter, ops)) > 0)
> +			iter.processed = iomap_zero_iter(&iter, did_zero);
> +
> +		/* reset iterator for the rest of the range */
> +		iomap_iter_init(&iter, inode, iter.pos,
> +			len - (iter.pos - pos), IOMAP_ZERO);

Nit: maybe one more tab ^ here?

Also from the previous thread: can you reset the original iter instead
of declaring a second one by zeroing the mappings/processed fields,
re-expanding iter::len, and resetting iter::flags?

I guess we'll still do the flush if the start of the zeroing range
aligns with an fsblock?  I guess if you're going to do a lot of small
extensions then once per fsblock isn't too bad?

--D

> +		if (ret || !iter.len)
> +			return ret;
> +	}
> +
> +	/*
> +	 * To avoid an unconditional flush, check dirty state and defer the
> +	 * flush until a combination of dirty pagecache and at least one
> +	 * mapping that might convert on writeback is seen.
>  	 */
>  	range_dirty = filemap_range_needs_writeback(inode->i_mapping,
> -					pos, pos + len - 1);
> +					iter.pos, iter.pos + iter.len - 1);
>  	while ((ret = iomap_iter(&iter, ops)) > 0) {
>  		const struct iomap *s = iomap_iter_srcmap(&iter);
>  		if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) {
> -- 
> 2.46.2
> 
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux