On Wed, Jul 29, 2020 at 03:19:23PM +1000, Dave Chinner wrote: > On Wed, Jul 29, 2020 at 03:12:31AM +0100, Matthew Wilcox wrote: > > On Wed, Jul 29, 2020 at 11:54:58AM +1000, Dave Chinner wrote: > > > On Tue, Jul 28, 2020 at 04:47:53PM +0100, Matthew Wilcox wrote: > > > > I propose we do away with the 'uptodate' bit-array and replace it with an > > > > 'writeback' bit-array. We set the page uptodate bit whenever the reads to > > > > > > That's just per-block dirty state tracking. But when we set a single > > > bit, we still need to set the page dirty flag. > > > > It's not exactly dirty, though. It's 'present' (ie the opposite > > of hole). > > Careful with your terminology. At the page cache level, there is no > such thing as a "hole". There is only data and whether the data is > up to date or not. The page cache may be *sparsely populated*, but > a lack of a page or a range of the page that is not up to date > does not imply there is a -hole in the file- at that point. That's not entirely true. The current ->uptodate array does keep track of whether an unwritten extent is currently a hole (see page_cache_seek_hole_data()). I don't know how useful that is. > I'm still not sure what "present" is supposed to mean, though, > because it seems no different to "up to date". The data is present > once it's been read into the page, calling page_mkwrite() on the > page doesn't change that at all. I had a bit of a misunderstanding. Let's discard that proposal and discuss what we want to optimise for, ignoring THPs. We don't need to track any per-block state, of course. We could implement __iomap_write_begin() by reading in the entire page (skipping the last few blocks if they lie outside i_size, of course) and then marking the entire page Uptodate. Buffer heads track several bits of information about each block: - Uptodate (contents of cache at least as recent as storage) - Dirty (contents of cache more recent than storage) - ... er, I think all the rest are irrelevant for iomap I think I just talked myself into what you were arguing for -- that we change the ->uptodate bit array into a ->dirty bit array. That implies that we lose the current optimisation that we can write at a blocksize alignment into the page cache and not read from storage. I'm personally fine with that; most workloads don't care if you read extra bytes from storage (hence readahead), but writing unnecessarily to storage (particularly flash) is bad. Or we keep two bits per block. The implementation would be a little icky, but it could be done. I like the idea of getting rid of partially uptodate pages. I've never really understood the concept. For me, a partially dirty page makes a lot more sense than a partially uptodate page. Perhaps I'm just weird. Speaking of weird, I don't understand why an unwritten extent queries the uptodate bits. Maybe that's a buffer_head thing and we can just ignore it -- iomap doesn't have such a thing as a !uptodate page any more.