On Thu, Oct 31, 2024 at 10:04:48AM -0400, Brian Foster wrote: > iomap zero range flushes pagecache in certain situations to > determine which parts of the range might require zeroing if dirty > data is present in pagecache. The kernel robot recently reported a > regression associated with this flushing in the following stress-ng > workload on XFS: > > stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --metamix 64 > > This workload involves repeated small, strided, extending writes. On > XFS, this produces a pattern of post-eof speculative preallocation, > conversion of preallocation from delalloc to unwritten, dirtying > pagecache over newly unwritten blocks, and then rinse and repeat > from the new EOF. This leads to repetitive flushing of the EOF folio > via the zero range call XFS uses for writes that start beyond > current EOF. > > To mitigate this problem, special case EOF block zeroing to prefer > zeroing the folio over a flush when the EOF folio is already dirty. > To do this, split out and open code handling of an unaligned start > offset. This brings most of the performance back by avoiding flushes > on zero range calls via write and truncate extension operations. The > flush doesn't occur in these situations because the entire range is > post-eof and therefore the folio that overlaps EOF is the only one > in the range. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> # v6.12-rc1 Fixes: 7d9b474ee4cc37 ("iomap: make zero range flush conditional on unwritten mappings") perhaps? > --- > fs/iomap/buffered-io.c | 42 ++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 38 insertions(+), 4 deletions(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index 60386cb7b9ef..343a2fa29bec 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -227,6 +227,18 @@ static void ifs_free(struct folio *folio) > kfree(ifs); > } > > +/* helper to reset an iter for reuse */ > +static inline void > +iomap_iter_init(struct iomap_iter *iter, struct inode *inode, loff_t pos, > + loff_t len, unsigned flags) Nit: maybe call this iomap_iter_reset() ? Also I wonder if it's really safe to zero iomap_iter::private? Won't doing that leave a minor logic bomb? > +{ > + memset(iter, 0, sizeof(*iter)); > + iter->inode = inode; > + iter->pos = pos; > + iter->len = len; > + iter->flags = flags; > +} > + > /* > * Calculate the range inside the folio that we actually need to read. > */ > @@ -1416,6 +1428,10 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, > .len = len, > .flags = IOMAP_ZERO, > }; > + struct address_space *mapping = inode->i_mapping; > + unsigned int blocksize = i_blocksize(inode); > + unsigned int off = pos & (blocksize - 1); > + loff_t plen = min_t(loff_t, len, blocksize - off); > int ret; > bool range_dirty; > > @@ -1425,12 +1441,30 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, > * mapping converts on writeback completion and must be zeroed. > * > * The simplest way to deal with this is to flush pagecache and process > - * the updated mappings. To avoid an unconditional flush, check dirty > - * state and defer the flush until a combination of dirty pagecache and > - * at least one mapping that might convert on writeback is seen. > + * the updated mappings. First, special case the partial eof zeroing > + * use case since it is more performance sensitive. Zero the start of > + * the range if unaligned and already dirty in pagecache. > + */ > + if (off && > + filemap_range_needs_writeback(mapping, pos, pos + plen - 1)) { > + iter.len = plen; > + while ((ret = iomap_iter(&iter, ops)) > 0) > + iter.processed = iomap_zero_iter(&iter, did_zero); > + > + /* reset iterator for the rest of the range */ > + iomap_iter_init(&iter, inode, iter.pos, > + len - (iter.pos - pos), IOMAP_ZERO); Nit: maybe one more tab ^ here? Also from the previous thread: can you reset the original iter instead of declaring a second one by zeroing the mappings/processed fields, re-expanding iter::len, and resetting iter::flags? I guess we'll still do the flush if the start of the zeroing range aligns with an fsblock? I guess if you're going to do a lot of small extensions then once per fsblock isn't too bad? --D > + if (ret || !iter.len) > + return ret; > + } > + > + /* > + * To avoid an unconditional flush, check dirty state and defer the > + * flush until a combination of dirty pagecache and at least one > + * mapping that might convert on writeback is seen. > */ > range_dirty = filemap_range_needs_writeback(inode->i_mapping, > - pos, pos + len - 1); > + iter.pos, iter.pos + iter.len - 1); > while ((ret = iomap_iter(&iter, ops)) > 0) { > const struct iomap *s = iomap_iter_srcmap(&iter); > if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) { > -- > 2.46.2 > >