[GIT PULL] slab updates for 6.10

Vlastimil Babka <vbabka@xxxxxxx> · Thu, 9 May 2024 16:25:05 +0200

Hi Linus,

please pull the latest slab updates from:

  git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git tags/slab-for-6.10

Sending this early due to upcoming LSF/MM travel and chances there's no rc8.

Thanks,
Vlastimil

======================================

This time it's mostly random cleanups and fixes, with two performance fixes
that might have significant impact, but limited to systems experiencing
particular bad corner case scenarios rather than general performance
improvements.

The memcg hook changes are going through the mm tree due to dependencies.

- Prevent stalls when reading /proc/slabinfo (Jianfeng Wang)

  This fixes the long-standing problem that can happen with workloads that have
  alloc/free patterns resulting in many partially used slabs (in e.g. dentry
  cache). Reading /proc/slabinfo will traverse the long partial slab list under
  spinlock with disabled irqs and thus can stall other processes or even
  trigger the lockup detection. The traversal is only done to count free
  objects so that <active_objs> column can be reported along with <num_objs>.

  To avoid affecting fast paths with another shared counter (attempted in the
  past) or complex partial list traversal schemes that allow rescheduling, the
  chosen solution resorts to approximation - when the partial list is over
  10000 slabs long, we will only traverse first 5000 slabs from head and tail
  each and use the average of those to estimate the whole list. Both head and
  tail are used as the slabs near head to tend to have more free objects than
  the slabs towards the tail.

  It is expected the approximation should not break existing /proc/slabinfo
  consumers. The <num_objs> field is still accurate and reflects the overall
  kmem_cache footprint. The <active_objs> was already imprecise due to cpu and
  percpu-partial slabs, so can't be relied upon to determine exact cache usage.
  The difference between <active_objs> and <num_objs> is mainly useful to
  determine the slab fragmentation, and that will be possible even with the
  approximation in place.

- Prevent allocating many slabs when a NUMA node is full (Chen Jun)

  Currently, on NUMA systems with a node under significantly bigger pressure
  than other nodes, the fallback strategy may result in each kmalloc_node()
  that can't be safisfied from the preferred node, to allocate a new slab on a
  fallback node, and not reuse the slabs already on that node's partial list.

  This is now fixed and partial lists of fallback nodes are checked even for
  kmalloc_node() allocations. It's still preferred to allocate a new slab on
  the requested node before a fallback, but only with a GFP_NOWAIT attempt,
  which will fail quickly when the node is under a significant memory pressure.

- More SLAB removal related cleanups (Xiu Jianfeng, Hyunmin Lee)

- Fix slub_kunit self-test with hardened freelists (Guenter Roeck)

- Mark racy accesses for KCSAN (linke li)

- Misc cleanups (Xiongwei Song, Haifeng Xu, Sangyun Kim)

----------------------------------------------------------------
Chen Jun (1):
      mm/slub: Reduce memory consumption in extreme scenarios

Guenter Roeck (1):
      mm/slub, kunit: Use inverted data to corrupt kmem cache

Haifeng Xu (1):
      slub: Set __GFP_COMP in kmem_cache by default

Hyunmin Lee (2):
      mm/slub: create kmalloc 96 and 192 caches regardless cache size order
      mm/slub: remove the check for NULL kmalloc_caches

Jianfeng Wang (2):
      slub: introduce count_partial_free_approx()
      slub: use count_partial_free_approx() in slab_out_of_memory()

Sangyun Kim (1):
      mm/slub: remove duplicate initialization for early_kmem_cache_node_alloc()

Xiongwei Song (3):
      mm/slub: remove the check of !kmem_cache_has_cpu_partial()
      mm/slub: add slub_get_cpu_partial() helper
      mm/slub: simplify get_partial_node()

Xiu Jianfeng (2):
      mm/slub: remove dummy slabinfo functions
      mm/slub: correct comment in do_slab_free()

linke li (2):
      mm/slub: mark racy accesses on slab->slabs
      mm/slub: mark racy access on slab->freelist

 lib/slub_kunit.c |   2 +-
 mm/slab.h        |   3 --
 mm/slab_common.c |  27 +++++--------
 mm/slub.c        | 118 ++++++++++++++++++++++++++++++++++++++++---------------
 4 files changed, 96 insertions(+), 54 deletions(-)