Re: slabinfo shows incorrect active_objs ???

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 28, 2022 at 09:17:27AM +0300, Vasily Averin wrote:
> On 25.02.2022 07:37, Vasily Averin wrote:
> > On 25.02.2022 03:08, Roman Gushchin wrote:
> > > 
> > > > On Feb 24, 2022, at 5:17 AM, Vasily Averin <vvs@xxxxxxxxxxxxx> wrote:
> > > > 
> > > > On 22.02.2022 19:32, Shakeel Butt wrote:
> > > > > If you are just interested in the stats, you can use SLAB for your experiments.
> > > > 
> > > > Unfortunately memcg_slabino.py does not support SLAB right now.
> > > > 
> > > > > On 23.02.2022 20:31, Vlastimil Babka wrote:
> > > > > > On 2/23/22 04:45, Hyeonggon Yoo wrote:
> > > > > > On Wed, Feb 23, 2022 at 01:32:36AM +0100, Vlastimil Babka wrote:
> > > > > > > Hm it would be easier just to disable merging when the precise counters are
> > > > > > > enabled. Assume it would be a config option (possibly boot-time option with
> > > > > > > static keys) anyway so those who don't need them can avoid the overhead.
> > > > > > 
> > > > > > Is it possible to accurately account objects in SLUB? I think it's not
> > > > > > easy because a CPU can free objects to remote cpu's partial slabs using
> > > > > > cmpxchg_double()...
> > > > > AFAIU Roman's idea would be that each alloc/free would simply inc/dec an
> > > > > object counter that's disconnected from physical handling of particular sl*b
> > > > > implementation. It would provide exact count of objects from the perspective
> > > > > of slab users.
> > > > > I assume for reduced overhead the counters would be implemented in a percpu
> > > > > fashion as e.g. vmstats. Slabinfo gathering would thus have to e.g. sum up
> > > > > those percpu counters.
> > > > 
> > > > I like this idea too and I'm going to spend some time for its implementation.
> > > 
> > > Sounds good!
> > > 
> > > Unfortunately it’s quite tricky: the problem is that there is potentially a large and dynamic set of cgroups and also large and dynamic set of slab caches. Given the performance considerations, it’s also unlikely to avoid using percpu variables.
> > > So we come to the (nr_slab_caches * nr_cgroups * nr_cpus) number of “objects”. If we create them proactively, we’re likely wasting lot of memory. Creating them on demand is tricky too (especially without losing some accounting accuracy).
> > 
> > I told about global (i.e. non-memcg) precise slab counters only.
> > I'm expect it can done under new config option and/or static key, and if present use them in /proc/slabinfo output.
> > 
> > At present I'm still going to extract memcg counters via your memcg_slabinfo script.
> 
> I'm not sure I'll be able to debug this patch properly and decided to submit it as is.
> I hope it can be useful.
> 
> In general it works and /proc/slabinfo shows reasonable numbers,
> however in some cases they differs from crash' "kmem -s" output, either +1 or -1.
> Obviously I missed something.
> 
> ---[cut here]---
> [PATCH RFC] slub: precise in-use counter for /proc/slabinfo output
> 
> Signed-off-by: Vasily Averin <vvs@xxxxxxxxxxxxx>
> ---
>  include/linux/slub_def.h |  3 +++
>  init/Kconfig             |  7 +++++++
>  mm/slub.c                | 20 +++++++++++++++++++-
>  3 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index 33c5c0e3bd8d..d22e18dfe905 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -56,6 +56,9 @@ struct kmem_cache_cpu {
>  #ifdef CONFIG_SLUB_STATS
>  	unsigned stat[NR_SLUB_STAT_ITEMS];
>  #endif
> +#ifdef CONFIG_SLUB_PRECISE_INUSE
> +	unsigned inuse;		/* Precise in-use counter */
> +#endif
>  };
>  #ifdef CONFIG_SLUB_CPU_PARTIAL
> diff --git a/init/Kconfig b/init/Kconfig
> index e9119bf54b1f..5c57bdbb8938 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1995,6 +1995,13 @@ config SLUB_CPU_PARTIAL
>  	  which requires the taking of locks that may cause latency spikes.
>  	  Typically one would choose no for a realtime system.
> +config SLUB_PRECISE_INUSE
> +	default n
> +	depends on SLUB && SMP
> +	bool "SLUB precise in-use counter"
> +	help
> +	  Per cpu in-use counter shows precise statistic in slabinfo.
> +
>  config MMAP_ALLOW_UNINITIALIZED
>  	bool "Allow mmapped anonymous memory to be uninitialized"
>  	depends on EXPERT && !MMU
> diff --git a/mm/slub.c b/mm/slub.c
> index 261474092e43..90750cae0af9 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3228,6 +3228,9 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
>  out:
>  	slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init);
> +#ifdef CONFIG_SLUB_PRECISE_INUSE
> +	raw_cpu_inc(s->cpu_slab->inuse);
> +#endif
>  	return object;
>  }
> @@ -3506,8 +3509,12 @@ static __always_inline void slab_free(struct kmem_cache *s, struct slab *slab,
>  	 * With KASAN enabled slab_free_freelist_hook modifies the freelist
>  	 * to remove objects, whose reuse must be delayed.
>  	 */
> -	if (slab_free_freelist_hook(s, &head, &tail, &cnt))
> +	if (slab_free_freelist_hook(s, &head, &tail, &cnt)) {
>  		do_slab_free(s, slab, head, tail, cnt, addr);
> +#ifdef CONFIG_SLUB_PRECISE_INUSE
> +		raw_cpu_sub(s->cpu_slab->inuse, cnt);
> +#endif
> +	}
>  }
>  #ifdef CONFIG_KASAN_GENERIC
> @@ -6253,6 +6260,17 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
>  		nr_free += count_partial(n, count_free);
>  	}
> +#ifdef CONFIG_SLUB_PRECISE_INUSE
> +	{
> +		unsigned int cpu, nr_inuse = 0;
> +
> +		for_each_possible_cpu(cpu)
> +			nr_inuse += per_cpu_ptr((s)->cpu_slab, cpu)->inuse;
> +
> +		if (nr_inuse <= nr_objs)
> +			nr_free = nr_objs - nr_inuse;
> +	}
> +#endif
>  	sinfo->active_objs = nr_objs - nr_free;
>  	sinfo->num_objs = nr_objs;
>  	sinfo->active_slabs = nr_slabs;

Hi Vasily, thank you for this patch.
This looks nice, but I see things we can improve:

1) using raw_cpu_{inc,sub}(), s->cpu_slab->inuse will be racy if kernel
can be preempted. slub does not disable preemption/interrupts at all in fastpath.

And yeah, we can accept being racy to some degree. but it will be incorrect
more and more if system is up for long time. So I think atomic integer
is right choice if correctness is important?

2) This code is not aware of cpu partials. there is list of slab for
each kmem_cache_cpu. you can iterate them by: 

	kmem_cache_cpu->partial->next->next->next->... and so on until it enters NULL.

So we need to count cpu partials' inuse too.
Then we need per-slab counters... I think we can use struct slab's
__unused field for this?

Thanks :)

-- 
Thank you, You are awesome!
Hyeonggon :-)





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux