On Sun, Jan 12, 2025 at 08:51:12PM -0800, Christoph Hellwig wrote: > On Fri, Jan 10, 2025 at 12:53:19PM -0500, Brian Foster wrote: > > processing. > > > > For example, if we have a largish dirty folio backed by an unwritten > > extent with maybe a single block that is actually dirty, would we be > > alright to just zero the requested portion of the folio as long as some > > part of the folio is dirty? Given the historical ad hoc nature of XFS > > speculative prealloc zeroing, personally I don't see that as much of an > > issue in practice as long as subsequent reads return zeroes, but I could > > be missing something. > > That's a very good question I haven't though about much yet. And > everytime I try to think of the speculative preallocations and they're > implications my head begins to implode.. > Heh. Just context on my thought process, FWIW.. We've obviously zeroed the newly exposed file range for writes that start beyond EOF for quite some time. This new post-eof range may or may not have been backed by speculative prealloc, and if so, that prealloc may either be delalloc or unwritten extents depending on whether writeback occurred on the EOF extent (assuming large enough free extents, etc.) before the extending write. In turn, this means that extending write zero range would have either physically zeroed delalloc extents or skipped unwritten blocks, depending on the situation. Personally, I don't think it really matters which as there is no real guarantee that "all blocks not previously written to are unwritten," for example, but rather just that "all blocks not written to return zeroes on read." For that reason, I'm _hoping_ that we can keep this simple and just deal with some potential spurious zeroing on folios that are already dirty, but I'm open to arguments against that. Note that the post-eof zero range behavior changed in XFS sometime over the past few releases or so in that it always converts post-eof delalloc to unwritten in iomap_begin(), but IIRC this was to deal with some other unrelated issue. I also don't think that change is necessarily right because it can significantly increase the rate of physical block allocations in some workloads, but that's a separate issue, particularly now that zero range doesn't update i_size.. ;P > > > > static inline void iomap_iter_reset_iomap(struct iomap_iter *iter) > > > > { > > > > + if (iter->fbatch) { > > > > + folio_batch_release(iter->fbatch); > > > > + kfree(iter->fbatch); > > > > + iter->fbatch = NULL; > > > > + } > > > > > > Does it make sense to free the fbatch allocation on every iteration, > > > or should we keep the memory allocation around and only free it after > > > the last iteration? > > > > > > > In the current implementation the existence of the fbatch is what > > controls the folio lookup path, so we'd only want it for unwritten > > mappings. That said, this could be done differently with a flag or > > something that indicates whether to use the batch. Given that we release > > the folios anyways and zero range isn't the most frequent thing, I > > figured this keeps things simple for now. I don't really have a strong > > preference for either approach, however. > > I was just worried about the overhead of allocating and freeing > it all the time. OTOH we probably rarely have more than a single > extent to process with the batch right now. > Ok. This maintains zero range performance in my testing so far, so I'm going to maintain simplicity for now until there's a reason to do otherwise. I'm open to change it on future iterations. I suspect it might anyways if this ends up used for more operations... Thanks again for the comments. Brian