On Mon, Oct 31, 2022 at 03:43:24AM +0000, Matthew Wilcox wrote: > I agree that bufferheads do bottom-up dirty tracking, but I don't think > that what Ritesh is doing here is bottom-up dirty tracking. Buffer > heads expose an API to dirty a block, which necessarily goes bottom-up. > There's no API here to dirty a block. Instead there's an API to dirty > a range of a folio, so we're still top-down; we're just keeping track > of it in a more precise way. Agreed. > If there is any dirty region, the folio must be marked dirty (otherwise > we'll never know that it needs to be written back). The interesting > question (as your paragraph below hints) is whether removing the dirty > part of a folio from a file marks the folio clean. I believe that's > optional, but it's probably worth doing. Also agreed. > > What happens with direct extent manipulation like fallocate() > > operations? These invalidate the parts of the page cache over the > > range we are punching, shifting, etc, without interacting directly > > with iomap, so do we now have to ensure that the sub-folio dirty > > regions are also invalidated correctly? i.e. do functions like > > xfs_flush_unmap_range() need to become iomap infrastructure so that > > they can update sub-folio dirty ranges correctly? > > I'm slightly confused by this question. As I understand the various > fallocate operations, they start by kicking out all the folios affected > by the operation (generally from the start of the operation to EOF), > so we'd writeback the (dirty part of) folios which are dirty, then > invalidate the folios in cache. I'm not sure there's going to be > much difference. Yes. As far as I can tell all pagecache manipulation for the fallocate operations is driven by the file system and it is only done by those the punch/zero/move ranges. The file system then goes though the normal pagecache truncate helpers rounded to the block size, which through the ops should do the right thing. > Yes. This is also going to be a performance problem. Marking a folio as > dirty is no longer just setting the bit in struct folio and the xarray > but also setting all the bits in iop->state. Depending on the size > of the folio, and the fs blocksize, this could be quite a lot of bits. > eg a 2MB folio with a 1k block size is 2048 bits (256 bytes, 6 cachelines > (it dirties the spinlock in cacheline 0, then the bitmap occupies 3 full > cachelines and 2 partial ones)). We can always optimize by having a bit for the fairly common all dirty case and only track and look at the array if that is no the case. > filesystems right now. Dave Howells' netfs infrastructure is trying > to solve the problem for everyone (and he's been looking at iomap as > inspiration for what he's doing). Btw, I never understod why the network file systems don't just use iomap. There is nothing block specific in the core iomap code.