Hi Linus, please pull the latest slab updates from: git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git tags/slab-for-6.10 Sending this early due to upcoming LSF/MM travel and chances there's no rc8. Thanks, Vlastimil ====================================== This time it's mostly random cleanups and fixes, with two performance fixes that might have significant impact, but limited to systems experiencing particular bad corner case scenarios rather than general performance improvements. The memcg hook changes are going through the mm tree due to dependencies. - Prevent stalls when reading /proc/slabinfo (Jianfeng Wang) This fixes the long-standing problem that can happen with workloads that have alloc/free patterns resulting in many partially used slabs (in e.g. dentry cache). Reading /proc/slabinfo will traverse the long partial slab list under spinlock with disabled irqs and thus can stall other processes or even trigger the lockup detection. The traversal is only done to count free objects so that <active_objs> column can be reported along with <num_objs>. To avoid affecting fast paths with another shared counter (attempted in the past) or complex partial list traversal schemes that allow rescheduling, the chosen solution resorts to approximation - when the partial list is over 10000 slabs long, we will only traverse first 5000 slabs from head and tail each and use the average of those to estimate the whole list. Both head and tail are used as the slabs near head to tend to have more free objects than the slabs towards the tail. It is expected the approximation should not break existing /proc/slabinfo consumers. The <num_objs> field is still accurate and reflects the overall kmem_cache footprint. The <active_objs> was already imprecise due to cpu and percpu-partial slabs, so can't be relied upon to determine exact cache usage. The difference between <active_objs> and <num_objs> is mainly useful to determine the slab fragmentation, and that will be possible even with the approximation in place. - Prevent allocating many slabs when a NUMA node is full (Chen Jun) Currently, on NUMA systems with a node under significantly bigger pressure than other nodes, the fallback strategy may result in each kmalloc_node() that can't be safisfied from the preferred node, to allocate a new slab on a fallback node, and not reuse the slabs already on that node's partial list. This is now fixed and partial lists of fallback nodes are checked even for kmalloc_node() allocations. It's still preferred to allocate a new slab on the requested node before a fallback, but only with a GFP_NOWAIT attempt, which will fail quickly when the node is under a significant memory pressure. - More SLAB removal related cleanups (Xiu Jianfeng, Hyunmin Lee) - Fix slub_kunit self-test with hardened freelists (Guenter Roeck) - Mark racy accesses for KCSAN (linke li) - Misc cleanups (Xiongwei Song, Haifeng Xu, Sangyun Kim) ---------------------------------------------------------------- Chen Jun (1): mm/slub: Reduce memory consumption in extreme scenarios Guenter Roeck (1): mm/slub, kunit: Use inverted data to corrupt kmem cache Haifeng Xu (1): slub: Set __GFP_COMP in kmem_cache by default Hyunmin Lee (2): mm/slub: create kmalloc 96 and 192 caches regardless cache size order mm/slub: remove the check for NULL kmalloc_caches Jianfeng Wang (2): slub: introduce count_partial_free_approx() slub: use count_partial_free_approx() in slab_out_of_memory() Sangyun Kim (1): mm/slub: remove duplicate initialization for early_kmem_cache_node_alloc() Xiongwei Song (3): mm/slub: remove the check of !kmem_cache_has_cpu_partial() mm/slub: add slub_get_cpu_partial() helper mm/slub: simplify get_partial_node() Xiu Jianfeng (2): mm/slub: remove dummy slabinfo functions mm/slub: correct comment in do_slab_free() linke li (2): mm/slub: mark racy accesses on slab->slabs mm/slub: mark racy access on slab->freelist lib/slub_kunit.c | 2 +- mm/slab.h | 3 -- mm/slab_common.c | 27 +++++-------- mm/slub.c | 118 ++++++++++++++++++++++++++++++++++++++++--------------- 4 files changed, 96 insertions(+), 54 deletions(-)