On 2/22/24 7:02 PM, Christoph Lameter (Ampere) wrote: > On Thu, 22 Feb 2024, Chengming Zhou wrote: > >> Anyway, I put the code below for discussion... > > Can we guestimate the free objects based on the number of partial slabs. That number is available. > Yes. I've thought about calculating the average number of free objects in a partial slab (through sampling) and then estimating the total number of free objects as (avg * n->nr_partial). See the following. --- mm/slub.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 63d281dfacdb..13385761049c 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2963,6 +2963,8 @@ static inline bool free_debug_processing(struct kmem_cache *s, #endif /* CONFIG_SLUB_DEBUG */ #if defined(CONFIG_SLUB_DEBUG) || defined(SLAB_SUPPORTS_SYSFS) +#define MAX_PARTIAL_TO_SCAN 10000 + static unsigned long count_partial(struct kmem_cache_node *n, int (*get_count)(struct slab *)) { @@ -2971,8 +2973,22 @@ static unsigned long count_partial(struct kmem_cache_node *n, struct slab *slab; spin_lock_irqsave(&n->list_lock, flags); - list_for_each_entry(slab, &n->partial, slab_list) - x += get_count(slab); + if (n->nr_partial > MAX_PARTIAL_TO_SCAN) { + /* Estimate total count of objects via sampling */ + unsigned long sample_rate = n->nr_partial / MAX_PARTIAL_TO_SCAN; + unsigned long scanned = 0; + unsigned long counted = 0; + list_for_each_entry(slab, &n->partial, slab_list) { + if (++scanned % sample_rate == 0) { + x += get_count(slab); + counted++; + } + } + x = mult_frac(x, n->nr_partial, counted); + } else { + list_for_each_entry(slab, &n->partial, slab_list) + x += get_count(slab); + } spin_unlock_irqrestore(&n->list_lock, flags); return x; } -- > How accurate need the accounting be? We also have fuzzy accounting in the VM counters. Based on my experience, for a |kmem_cache|, the total number of objects can tell whether the |kmem_cache| has been heavily used by a workload. When the total number is large: if the number of free objects is small, then either these objects are really in-use or there is *memory leak* going on (which then must be further diagnosed). However, if the number of free objects is large, we can only know the slab memory fragmentation happens. So, I think the object accounting needn't be accurate. We only have to tell whether a large percentage of slab objects is free or not. The above code is a sampling, which should do the job if we take enough samples.