Re: [PATCH RFCv2 2/4] iomap: optional zero range dirty folio processing

Brian Foster <bfoster@xxxxxxxxxx> · Mon, 13 Jan 2025 09:32:37 -0500

On Sun, Jan 12, 2025 at 08:51:12PM -0800, Christoph Hellwig wrote:
> On Fri, Jan 10, 2025 at 12:53:19PM -0500, Brian Foster wrote:
> > processing.
> > 
> > For example, if we have a largish dirty folio backed by an unwritten
> > extent with maybe a single block that is actually dirty, would we be
> > alright to just zero the requested portion of the folio as long as some
> > part of the folio is dirty? Given the historical ad hoc nature of XFS
> > speculative prealloc zeroing, personally I don't see that as much of an
> > issue in practice as long as subsequent reads return zeroes, but I could
> > be missing something.
> 
> That's a very good question I haven't though about much yet.  And
> everytime I try to think of the speculative preallocations and they're
> implications my head begins to implode..
> 

Heh. Just context on my thought process, FWIW.. We've obviously zeroed
the newly exposed file range for writes that start beyond EOF for quite
some time. This new post-eof range may or may not have been backed by
speculative prealloc, and if so, that prealloc may either be delalloc or
unwritten extents depending on whether writeback occurred on the EOF
extent (assuming large enough free extents, etc.) before the extending
write.

In turn, this means that extending write zero range would have either
physically zeroed delalloc extents or skipped unwritten blocks,
depending on the situation. Personally, I don't think it really matters
which as there is no real guarantee that "all blocks not previously
written to are unwritten," for example, but rather just that "all blocks
not written to return zeroes on read." For that reason, I'm _hoping_
that we can keep this simple and just deal with some potential spurious
zeroing on folios that are already dirty, but I'm open to arguments
against that.

Note that the post-eof zero range behavior changed in XFS sometime over
the past few releases or so in that it always converts post-eof delalloc
to unwritten in iomap_begin(), but IIRC this was to deal with some other
unrelated issue. I also don't think that change is necessarily right
because it can significantly increase the rate of physical block
allocations in some workloads, but that's a separate issue, particularly
now that zero range doesn't update i_size.. ;P

> > > >  static inline void iomap_iter_reset_iomap(struct iomap_iter *iter)
> > > >  {
> > > > +	if (iter->fbatch) {
> > > > +		folio_batch_release(iter->fbatch);
> > > > +		kfree(iter->fbatch);
> > > > +		iter->fbatch = NULL;
> > > > +	}
> > > 
> > > Does it make sense to free the fbatch allocation on every iteration,
> > > or should we keep the memory allocation around and only free it after
> > > the last iteration?
> > > 
> > 
> > In the current implementation the existence of the fbatch is what
> > controls the folio lookup path, so we'd only want it for unwritten
> > mappings. That said, this could be done differently with a flag or
> > something that indicates whether to use the batch. Given that we release
> > the folios anyways and zero range isn't the most frequent thing, I
> > figured this keeps things simple for now. I don't really have a strong
> > preference for either approach, however.
> 
> I was just worried about the overhead of allocating and freeing
> it all the time.  OTOH we probably rarely have more than a single
> extent to process with the batch right now.
> 

Ok. This maintains zero range performance in my testing so far, so I'm
going to maintain simplicity for now until there's a reason to do
otherwise. I'm open to change it on future iterations. I suspect it
might anyways if this ends up used for more operations...

Thanks again for the comments.

Brian