On Tue, Aug 03, 2021 at 04:28:14PM +0100, Matthew Wilcox wrote: > Solution 1: Add an array of dirty bits to the iomap_page > data structure. This patch already exists; would need > to be adjusted slightly to apply to the current tree. > https://lore.kernel.org/linux-xfs/7fb4bb5a-adc7-5914-3aae-179dd8f3adb1@xxxxxxxxxx/ > Solution 2a: Replace the array of uptodate bits with an array of > dirty bits. It is not often useful to know which parts of the page are > uptodate; usually the entire page is uptodate. We can actually use the > dirty bits for the same purpose as uptodate bits; if a block is dirty, it > is definitely uptodate. If a block is !dirty, and the page is !uptodate, > the block may or may not be uptodate, but it can be safely re-read from > storage without losing any data. 1 or 2a seems like something we should do once we have lage folio support. > Solution 2b: Lose the concept of partially uptodate pages. If we're > going to write to a partial page, just bring the entire page uptodate > first, then write to it. It's not clear to me that partially-uptodate > pages are really useful. I don't know of any network filesystems that > support partially-uptodate pages, for example. It seems to have been > something we did for buffer_head based filesystems "because we could" > rather than finding a workload that actually cares. The uptodate bit is important for the use case of a smaller than page size buffered write into a page that hasn't been read in already, which is fairly common for things like log writes. So I'd hate to lose this optimization. > (it occurs to me that solution 3 actually allows us to do IOs at storage > block size instead of filesystem block size, potentially reducing write > amplification even more, although we will need to be a bit careful if > we're doing a CoW.) number 3 might be nice optimization. The even better version would be a disk format change to just log those updates in the log and otherwise use the normal dirty mechanism. I once had a crude prototype for that.