Other than the obvious "remove calls to compound_head" changes, the fundamental belief here is that iterating a linked list is much slower than iterating an array (5-15x slower in my testing). There's also an associated belief that since we iterate the batch of folios three times, we do better when the array is small (ie 15 entries) than we do with a batch that is hundreds of entries long, which only gives us the opportunity for the first pages to fall out of cache by the time we get to the end. It is possible we should increase the size of folio_batch. Hopefully the bots let us know if this introduces any performance regressions. v3: - Rebased on next-20240227 - Add folios_put_refs() to support unmapping large PTE-mapped folios - Used folio_batch_reinit() instead of assigning 0 to fbatch->nr. This makes sure the iterator is correctly reset. v2: - Redo the shrink_folio_list() patch to free the mapped folios at the end instead of calling try_to_unmap_flush() more often. - Improve a number of commit messages - Use pcp_allowed_order() instead of PAGE_ALLOC_COSTLY_ORDER (Ryan) - Fix move_folios_to_lru() comment (Ryan) - Add patches 15-18 - Collect R-b tags from Ryan Matthew Wilcox (Oracle) (18): mm: Make folios_put() the basis of release_pages() mm: Convert free_unref_page_list() to use folios mm: Add free_unref_folios() mm: Use folios_put() in __folio_batch_release() memcg: Add mem_cgroup_uncharge_folios() mm: Remove use of folio list from folios_put() mm: Use free_unref_folios() in put_pages_list() mm: use __page_cache_release() in folios_put() mm: Handle large folios in free_unref_folios() mm: Allow non-hugetlb large folios to be batch processed mm: Free folios in a batch in shrink_folio_list() mm: Free folios directly in move_folios_to_lru() memcg: Remove mem_cgroup_uncharge_list() mm: Remove free_unref_page_list() mm: Remove lru_to_page() mm: Convert free_pages_and_swap_cache() to use folios_put() mm: Use a folio in __collapse_huge_page_copy_succeeded() mm: Convert free_swap_cache() to take a folio include/linux/memcontrol.h | 26 +++-- include/linux/mm.h | 17 ++-- include/linux/swap.h | 8 +- mm/internal.h | 4 +- mm/khugepaged.c | 30 +++--- mm/memcontrol.c | 16 +-- mm/memory.c | 2 +- mm/mlock.c | 3 +- mm/page_alloc.c | 76 +++++++------- mm/swap.c | 198 ++++++++++++++++++++----------------- mm/swap_state.c | 33 ++++--- mm/vmscan.c | 52 ++++------ 12 files changed, 240 insertions(+), 225 deletions(-) -- 2.43.0