On Fri, Nov 08, 2024 at 07:42:45AM -0500, Brian Foster wrote: > iomap zero range flushes pagecache in certain situations to > determine which parts of the range might require zeroing if dirty > data is present in pagecache. The kernel robot recently reported a > regression associated with this flushing in the following stress-ng > workload on XFS: > > stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --metamix 64 > > This workload involves repeated small, strided, extending writes. On > XFS, this produces a pattern of post-eof speculative preallocation, > conversion of preallocation from delalloc to unwritten, dirtying > pagecache over newly unwritten blocks, and then rinse and repeat > from the new EOF. This leads to repetitive flushing of the EOF folio > via the zero range call XFS uses for writes that start beyond > current EOF. > > To mitigate this problem, special case EOF block zeroing to prefer > zeroing the folio over a flush when the EOF folio is already dirty. > To do this, split out and open code handling of an unaligned start > offset. This brings most of the performance back by avoiding flushes > on zero range calls via write and truncate extension operations. The > flush doesn't occur in these situations because the entire range is > post-eof and therefore the folio that overlaps EOF is the only one > in the range. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > fs/iomap/buffered-io.c | 28 ++++++++++++++++++++++++---- > 1 file changed, 24 insertions(+), 4 deletions(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index a78b5b9b3df3..7f40234a301e 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -1401,6 +1401,10 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, > .len = len, > .flags = IOMAP_ZERO, > }; > + struct address_space *mapping = inode->i_mapping; > + unsigned int blocksize = i_blocksize(inode); > + unsigned int off = pos & (blocksize - 1); > + loff_t plen = min_t(loff_t, len, blocksize - off); > int ret; > bool range_dirty; > > @@ -1410,12 +1414,28 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, > * mapping converts on writeback completion and so must be zeroed. > * > * The simplest way to deal with this across a range is to flush > - * pagecache and process the updated mappings. To avoid an unconditional > - * flush, check pagecache state and only flush if dirty and the fs > - * returns a mapping that might convert on writeback. > + * pagecache and process the updated mappings. To avoid excessive > + * flushing on partial eof zeroing, special case it to zero the > + * unaligned start portion if already dirty in pagecache. > + */ > + if (off && > + filemap_range_needs_writeback(mapping, pos, pos + plen - 1)) { > + iter.len = plen; > + while ((ret = iomap_iter(&iter, ops)) > 0) > + iter.processed = iomap_zero_iter(&iter, did_zero); > + > + iter.len = len - (iter.pos - pos); > + if (ret || !iter.len) > + return ret; This looks much cleaner to me now, thanks for iterating :) Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> --D > + } > + > + /* > + * To avoid an unconditional flush, check pagecache state and only flush > + * if dirty and the fs returns a mapping that might convert on > + * writeback. > */ > range_dirty = filemap_range_needs_writeback(inode->i_mapping, > - pos, pos + len - 1); > + iter.pos, iter.pos + iter.len - 1); > while ((ret = iomap_iter(&iter, ops)) > 0) { > const struct iomap *s = iomap_iter_srcmap(&iter); > > -- > 2.47.0 > >