On Mon, Aug 30, 2021 at 11:28:18AM -0700, Darrick J. Wong wrote: > On Sat, Aug 28, 2021 at 01:27:29PM -0600, Andreas Dilger wrote: > > On Aug 28, 2021, at 1:04 PM, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > > > The current folio work is focused on permitting the VM to use > > > physically contiguous chunks of memory. Both Darrick and Johannes > > > have pointed out the advantages of supporting logically-contiguous, > > > physically-discontiguous chunks of memory. Johannes wants to be able to > > > use order-0 allocations to allocate larger folios, getting the benefit > > > of managing the memory in larger chunks without requiring the memory > > > allocator to be able to find contiguous chunks. Darrick wants to support > > > non-power-of-two block sizes. > > > > What is the use case for non-power-of-two block sizes? The main question > > is whether that use case is important enough to add the complexity and > > overhead in order to support it? > > For copy-on-write to a XFS realtime volume where the allocation extent > size (we support bigalloc too! :P) is not a power of two (e.g. you set > up a 4 disk raid5 with 64k stripes, now the extent size is 192k). > > Granted, I don't think folios handling 192k chunks is absolutely > *required* for folios; the only hard requirement is that if any page in > a 192k extent becomes dirty, the rest have to get written out all the > same time, and the cow remap can only happen after the last page > finishes writeback. I /think/ "all pages get written out at the same time" is basically the same thing as "support a non-power-of-two block size". If we only have page A in the cache at the time it's going to be written back, we have to read in pages B and C in order to calculate the parity P. That will annoy writeback-because-we're-low-on-memory; I know we allow a certain amount of allocation to happen in the writeback path, but requiring 128kB to be allocated is a bit much. So we have to allow page A being dirty to pin pages B and C in the cache. I suppose that's possible; we could make (clean) pages B and C follow page A on the LRU, so they're going to still be in RAM at the time that page A is written back. I don't fully understand how the LRU works, but I assume it'd be a nightmare to ensure that A, B and C all move around the system in the same way. Much easier to ensure that ABC stay linked together and all get written back at once.