As I mentioned here [1], I have Thoughts on how the PCP allocator works in a memdesc world. Unlike my earlier Thoughts on the buddy allocator [2], we can actually make progress towards this one (and see substantial performance improvement, I believe). So it's ripe for someone to pick up. == With memdescs == When we have memdescs, allocating a folio from the buddy is a two step process. First we allocate the struct folio from slab, then we ask the buddy allocator for 2^n pages, each of which gets its memdesc set to point to this folio. It'll be similar for other memory descriptors, but let's keep it simple and just talk about folios for now. Usually when we free folios, it's due to memory pressure (yes, we'll free memory due to truncating a file or processes exiting and freeing their anonymous memory, but that's secondary). That means we're likely to want to allocate a folio again soon. Given that, returning the struct folio to the slab allocator seems like a waste of time. The PCP allocator can hold onto the struct folio as well as the underlying memory and then just hand it back to the next caller of folio_alloc. This also saves us from having to invent a 'struct pcpdesc' and swap the memdesc pointer from the folio to the pcpdesc. This implies that we no longer have a single pcp allocator for all types of memory; rather we have one for each memdesc type. I think that's going to be OK, but it might introduce some problems. == Before memdescs == Today we take all comers on the PCP list. __free_pages() calls free_the_page() calls free_unref_page() calls free_unref_page_prepare() calls free_pages_prepare() which undoes all the PageCompound work. Most multi-page allocations are compound. Slab, file, anon; it's all compound. I propose that we _only_ keep compound memory on the PCP list. Freeing non-compound multi-page memory can either convert it into compound pages before being placed on the PCP list or just hand the memory back to the buddy allocator. Non-compound multi-page allocations can either go straight to buddy or grab from the PCP list and undo the compound nature of the pages. I think this could be a huge saving. Consider allocating an order-9 PMD sized THP. Today we initialise compound_head in each of the 511 tail pages. Since a page is 64 bytes, we touch 32kB of memory! That's 2/3 of my CPU's L1 D$, so it's just pushed out a good chunk of my working set. And it's all dirty, so it has to get written back. We still need to distinguish between specifically folios (which need the folio_prep_large_rmappable() call on allocation and folio_undo_large_rmappable() on free) and other compound allocations which do not need or want this, but that's touching one/two extra cachelines, not 511. Do we have a volunteer? [1] https://lore.kernel.org/linux-mm/Za2lS-jG1s-HCqbx@xxxxxxxxxxxxxxxxxxxx/ [2] https://lore.kernel.org/linux-mm/ZamnIGxD8_dOJVi6@xxxxxxxxxxxxxxxxxxxx/