Re: [LSF/MM/BPF TOPIC] Large block for I/O

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Fri, 22 Dec 2023 13:29:17 +0000

On Fri, Dec 22, 2023 at 01:29:18PM +0100, Hannes Reinecke wrote:
> And that is actually a very valid point; memory fragmentation will become an
> issue with larger block sizes.
> 
> Theoretically it should be quite easily solved; just switch the memory
> subsystem to use the largest block size in the system, and run every smaller
> memory allocation via SLUB (or whatever the allocator-of-the-day
> currently is :-). Then trivially the system will never be fragmented,
> and I/O can always use large folios.
> 
> However, that means to do away with alloc_page(), which is still in
> widespread use throughout the kernel. I would actually in favour of it,
> but it might be that mm people have a different view.
> 
> Matthew, worth a new topic?
> Handling memory fragmentation on large block I/O systems?

I think if we're going to do that as a topic (and I'm not opposed!),
we need data.  Various workloads, various block sizes, etc.  Right now
people discuss this topic with "feelings" and "intuition" and I think
we need more than vibes to have a productive discussion.

My laptop (rebooted last night due to an unfortunate upgrade that left
anything accessing the sound device hanging ...):

MemTotal:       16006344 kB
MemFree:         2353108 kB
Cached:          7957552 kB
AnonPages:       4271088 kB
Slab:             654896 kB

so ~50% of my 16GB of memory is in the page cache and ~25% is anon memory.
If the page cache is all in 16kB chunks and we need to allocate order-2
folios in order to read from a file, we can find it easily by reclaiming
other order-2 folios from the page cache.  We don't need to resort to
heroics like eliminating use of alloc_page().

We should eliminate use of alloc_page() across most of the kernel, but
that's a different topic and one that has not much relevance to LSF/MM
since it's drivers that need to change, not the MM ;-)

Now, other people "feel" differently.  And that's cool, but we're not
going to have a productive discussion without data that shows whose
feelings represent reality and for which kinds of workloads.