On Wed, Jul 29, 2020 at 03:12:31AM +0100, Matthew Wilcox wrote: > On Wed, Jul 29, 2020 at 11:54:58AM +1000, Dave Chinner wrote: > > On Tue, Jul 28, 2020 at 04:47:53PM +0100, Matthew Wilcox wrote: > > > I propose we do away with the 'uptodate' bit-array and replace it with an > > > 'writeback' bit-array. We set the page uptodate bit whenever the reads to > > > > That's just per-block dirty state tracking. But when we set a single > > bit, we still need to set the page dirty flag. > > It's not exactly dirty, though. It's 'present' (ie the opposite > of hole). Careful with your terminology. At the page cache level, there is no such thing as a "hole". There is only data and whether the data is up to date or not. The page cache may be *sparsely populated*, but a lack of a page or a range of the page that is not up to date does not imply there is a -hole in the file- at that point. I'm still not sure what "present" is supposed to mean, though, because it seems no different to "up to date". The data is present once it's been read into the page, calling page_mkwrite() on the page doesn't change that at all. > I'm not attached to the name. So it can be used to > implement iomap_is_partially_uptodate. If the page is dirty, the chunks > corresponding to the present bits get written back, but we don't track > a per-block dirty state. iomap_is_partially_uptodate() only indicates whether data in the page is entirely valid or not. If it isn't entirely valid, then the caller has to ask the filesystem whether the underlying range contains holes or data.... > > > fill the page have completed rather than checking the 'writeback' array. > > > In page_mkwrite, we fill the writeback bit-array on the grounds that we > > > have no way to track a block's non-dirtiness and we don't want to scan > > > each block at writeback time to see if it's been written to. > > > > You're talking about mmap() access to the file here, not > > read/write() syscall access. If page_mkwrite() sets all the > > blocks in a page as "needing writeback", how is that different in > > any way to just using a single dirty bit? So why wouldn't we just do > > this in iomap_set_page_dirty()? > > iomap_set_page_dirty() is called from iomap_page_mkwrite_actor(), so > sure! via set_page_dirty(), which is why I mentioned this: > > The only place we wouldn't want to set the entire page dirty is > > the call from __iomap_write_end() which knows the exact range of the > > page that was dirtied. In which case, iomap_set_page_dirty_range() > > would be appropriate, right? i.e. we still have to do all the same > > page/page cache/inode dirtying, but only that would set a sub-page > > range of dirty bits in the iomap_page? > > > > /me doesn't see the point of calling dirty tracking "writeback bits" > > when "writeback" is a specific page state that comes between the > > "dirty" and "clean" states... > > I don't want to get it confused with page states. This is a different > thing. It's just tracking which blocks are holes (and have definitely > not been written to), so those blocks can remain as holes when the page > gets written back. We do not track holes at the page level. We do not want to track anything to do with the filesystem extent mapping at the page level. That was something that bufferheads were used for, and was something we specifically designed iomap specifically not to require. IOWs, iomap does page cache IO at page level granularity, not block level granularity. The only thing we track at block granularity is wither the range of the page over a given block contains valid data or not. i.e. whether the page has been initialised with the correct data or not. Further, page-mkwrite() has no knoweldge of whether the backing store has holes in it or not, nor does it care. All it does is call into the filesystem to fill any holes that may exist in the backing space behind the page. This is also needed for COW to allocate the destination of the over write, but in either case there is no interaction with pre-existing holes - that is all done by the read side of the page fault before page_mkwrite is called... IOWs, if you call page_mkwrite() on a THP, the filesystem will allocate/reserve the entire backing space behind the page because writeback of that THP requires writing the entire page and for backing space to be fully allocated before that write is issued. hence I'm really not sure what you are suggesting we do here because it doesn't make sense to me. Maybe I'm missing something that THP does that I'm not away of, but other than that I'm completely missing what you are trying to do here... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx