[RFC PATCH 00/14] Rearrange batched folio freeing

"Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx> · Fri, 25 Aug 2023 14:59:04 +0100

Other than the obvious "remove calls to compound_head" changes, the
fundamental belief here is that iterating a linked list is much slower
than iterating an array (5-15x slower in my testing).  There's also
an associated belief that since we iterate the batch of folios three
times, we do better when the array is small (ie 15 entries) than we do
with a batch that is hundreds of entries long, which only gives us the
opportunity for the first pages to fall out of cache by the time we get
to the end.

The one place where that probably falls down is "Free folios in a batch
in shrink_folio_list()" where we'll flush the TLB once per batch instead
of at the end.  That's going to take some benchmarking.

Matthew Wilcox (Oracle) (14):
  mm: Make folios_put() the basis of release_pages()
  mm: Convert free_unref_page_list() to use folios
  mm: Add free_unref_folios()
  mm: Use folios_put() in __folio_batch_release()
  memcg: Add mem_cgroup_uncharge_folios()
  mm: Remove use of folio list from folios_put()
  mm: Use free_unref_folios() in put_pages_list()
  mm: use __page_cache_release() in folios_put()
  mm: Handle large folios in free_unref_folios()
  mm: Allow non-hugetlb large folios to be batch processed
  mm: Free folios in a batch in shrink_folio_list()
  mm: Free folios directly in move_folios_to_lru()
  memcg: Remove mem_cgroup_uncharge_list()
  mm: Remove free_unref_page_list()

 include/linux/memcontrol.h |  24 ++---
 include/linux/mm.h         |  19 +---
 mm/internal.h              |   4 +-
 mm/memcontrol.c            |  16 ++--
 mm/mlock.c                 |   3 +-
 mm/page_alloc.c            |  74 ++++++++-------
 mm/swap.c                  | 180 ++++++++++++++++++++-----------------
 mm/vmscan.c                |  51 +++++------
 8 files changed, 181 insertions(+), 190 deletions(-)

-- 
2.40.1