Re: [PATCH v3 1/2] slub: introduce count_partial_free_approx()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/20/24 2:18 AM, David Rientjes wrote:
> On Fri, 19 Apr 2024, Jianfeng Wang wrote:
> 
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 1bb2a93cf7b6..993cbbdd2b6c 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -3213,6 +3213,43 @@ static inline bool free_debug_processing(struct kmem_cache *s,
>>  #endif /* CONFIG_SLUB_DEBUG */
>>  
>>  #if defined(CONFIG_SLUB_DEBUG) || defined(SLAB_SUPPORTS_SYSFS)
>> +#define MAX_PARTIAL_TO_SCAN 10000
>> +
>> +static unsigned long count_partial_free_approx(struct kmem_cache_node *n)
>> +{
>> +	unsigned long flags;
>> +	unsigned long x = 0;
>> +	struct slab *slab;
>> +
>> +	spin_lock_irqsave(&n->list_lock, flags);
>> +	if (n->nr_partial <= MAX_PARTIAL_TO_SCAN) {
>> +		list_for_each_entry(slab, &n->partial, slab_list)
>> +			x += slab->objects - slab->inuse;
>> +	} else {
>> +		/*
>> +		 * For a long list, approximate the total count of objects in
>> +		 * it to meet the limit on the number of slabs to scan.
>> +		 * Scan from both the list's head and tail for better accuracy.
>> +		 */
>> +		unsigned long scanned = 0;
>> +
>> +		list_for_each_entry(slab, &n->partial, slab_list) {
>> +			x += slab->objects - slab->inuse;
>> +			if (++scanned == MAX_PARTIAL_TO_SCAN / 2)
>> +				break;
>> +		}
>> +		list_for_each_entry_reverse(slab, &n->partial, slab_list) {
>> +			x += slab->objects - slab->inuse;
>> +			if (++scanned == MAX_PARTIAL_TO_SCAN)
>> +				break;
>> +		}
>> +		x = mult_frac(x, n->nr_partial, scanned);
>> +		x = min(x, node_nr_objs(n));
>> +	}
>> +	spin_unlock_irqrestore(&n->list_lock, flags);
>> +	return x;
>> +}
> 
> Creative :)
> 
> The default value of MAX_PARTIAL_TO_SCAN seems to work well in practice 
> while being large enough to bias for actual values?
> 
> I can't think of a better way to avoid the disruption that very long 
> partial lists cause.  If the actual value is needed, it will need to be 
> read from the sysfs file for that slab cache.
> 
> It does beg the question of whether we want to extend slabinfo to indicate 
> that some fields are approximations, however.  Adding a suffix such as 
> " : approx" to a slab cache line may be helpful if the disparity in the 
> estimates would actually make a difference in practice.

I'm afraid that changing the layout of /proc/slabinfo has a much higher
chance of breaking some consumer, than the imprecision due to approximation
has. So I would rather not risk it.

> I have a hard time believing that this approximation will not be "close 
> enough" for all practical purposes, given that the value could very well 
> substantially change the instant after the iteration is done anyway.
> 
> So for that reason, this sounds good to me!
> 
> Acked-by: David Rientjes <rientjes@xxxxxxxxxx>
> 
>> +
>>  static unsigned long count_partial(struct kmem_cache_node *n,
>>  					int (*get_count)(struct slab *))
>>  {
>> @@ -7089,7 +7126,7 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
>>  	for_each_kmem_cache_node(s, node, n) {
>>  		nr_slabs += node_nr_slabs(n);
>>  		nr_objs += node_nr_objs(n);
>> -		nr_free += count_partial(n, count_free);
>> +		nr_free += count_partial_free_approx(n);
>>  	}
>>  
>>  	sinfo->active_objs = nr_objs - nr_free;





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux