On Wed, Nov 20, 2024 at 12:43:47AM -0800, Christoph Hellwig wrote: > On Tue, Nov 19, 2024 at 10:46:54AM -0500, Brian Foster wrote: > > +loff_t > > +iomap_fill_dirty_folios( > > + struct inode *inode, > > + struct iomap *iomap, > > If you pass in the batch directly instead of the iomap this is > completely generic and could go into filemap.c. Adding willy > and linux-mm for these kinds of things also tends to help to > get good review feedback and often improvements. > That is a good idea, but I'm not sure this will remain generic. One of the tradeoffs this current patch makes is that for the sub-folio block size (whether small blocks or large folios), we zero any subrange so long as the high level folio is dirty. This causes something like generic/445 to fail on small block sizes because we explicitly zero a subrange that technically wasn't written to, but rather some other part of the same folio was, and therefore SEEK_DATA finds it after when a hole was expected based on the test workload. I was thinking of improving this by using ifs_find_dirty_range() here to further filter out unwritten subranges based on the target range being zeroed, but haven't done that yet. That said, I suspect that still won't be perfect and some spurious zeroing may still be possible. Personally, I think it might be fine to leave as is and just fix up the fstests test to be a little less strict about SEEK_DATA at block granularity. We have to writeback the folio anyways and so I'm not sure it makes much practical difference. So if there's preference to try and keep this generic in favor of that functional tradeoff, I'm certainly fine with going that route. Thoughts? (Hmm.. thinking more about it, it may also be possible to fix up via a post-process on the first/last folios in the batch, so maybe this doesn't technically have to be an either or choice.) > > + loff_t offset, > > + loff_t length) > > +{ > > + struct address_space *mapping = inode->i_mapping; > > + struct folio_batch fbatch; > > + pgoff_t start, end; > > + loff_t end_pos; > > + > > + folio_batch_init(&fbatch); > > + folio_batch_init(&iomap->fbatch); > > + > > + end_pos = offset + length; > > + start = offset >> PAGE_SHIFT; > > + end = (end_pos - 1) >> PAGE_SHIFT; > > Nit: initializing these at declaration time make the code easier to > read (at least for me :)). > Ok. > > + > > + while (filemap_get_folios(mapping, &start, end, &fbatch) && > > + folio_batch_space(&iomap->fbatch)) { > > + struct folio *folio; > > + while ((folio = folio_batch_next(&fbatch))) { > > + if (folio_trylock(folio)) { > > + bool clean = !folio_test_dirty(folio) && > > + !folio_test_writeback(folio); > > + folio_unlock(folio); > > + if (clean) > > + continue; > > What does the lock protect here given that it can become stale as soon > as we unlock? > I thought the lock was needed to correctly test for dirty and writeback, because dirty is cleared before writeback is set under folio lock. That said, I suppose this could also just do the folio_test_locked() thing that filemap_range_has_writeback() does. Just note that this is possibly affected by how we decide to handle the sub-folio case above, as I think we'd also need the lock for looking at the iomap_folio_state. > Note that there also is a filemap_get_folios_tag that only looks up > folios with the right tag (dirty or writeback). Currently it only > supports a single tag, which wuld not be helpful here, but this might > be worth talking to the pagecache and xarray maintainer. > Yeah, that was one of the first things I looked into before writing the fill helper, but atm it didn't look possible to check for an OR combination of xarray tags. That might be a nice future improvement if reasonably possible, or if somebody who knows that code better wants to cook something up faster than I can, but otherwise I don't really want to make that a hard requirement. Brian