On Mon, Feb 28, 2022 at 09:17:27AM +0300, Vasily Averin wrote: > On 25.02.2022 07:37, Vasily Averin wrote: > > On 25.02.2022 03:08, Roman Gushchin wrote: > > > > > > > On Feb 24, 2022, at 5:17 AM, Vasily Averin <vvs@xxxxxxxxxxxxx> wrote: > > > > > > > > On 22.02.2022 19:32, Shakeel Butt wrote: > > > > > If you are just interested in the stats, you can use SLAB for your experiments. > > > > > > > > Unfortunately memcg_slabino.py does not support SLAB right now. > > > > > > > > > On 23.02.2022 20:31, Vlastimil Babka wrote: > > > > > > On 2/23/22 04:45, Hyeonggon Yoo wrote: > > > > > > On Wed, Feb 23, 2022 at 01:32:36AM +0100, Vlastimil Babka wrote: > > > > > > > Hm it would be easier just to disable merging when the precise counters are > > > > > > > enabled. Assume it would be a config option (possibly boot-time option with > > > > > > > static keys) anyway so those who don't need them can avoid the overhead. > > > > > > > > > > > > Is it possible to accurately account objects in SLUB? I think it's not > > > > > > easy because a CPU can free objects to remote cpu's partial slabs using > > > > > > cmpxchg_double()... > > > > > AFAIU Roman's idea would be that each alloc/free would simply inc/dec an > > > > > object counter that's disconnected from physical handling of particular sl*b > > > > > implementation. It would provide exact count of objects from the perspective > > > > > of slab users. > > > > > I assume for reduced overhead the counters would be implemented in a percpu > > > > > fashion as e.g. vmstats. Slabinfo gathering would thus have to e.g. sum up > > > > > those percpu counters. > > > > > > > > I like this idea too and I'm going to spend some time for its implementation. > > > > > > Sounds good! > > > > > > Unfortunately it’s quite tricky: the problem is that there is potentially a large and dynamic set of cgroups and also large and dynamic set of slab caches. Given the performance considerations, it’s also unlikely to avoid using percpu variables. > > > So we come to the (nr_slab_caches * nr_cgroups * nr_cpus) number of “objects”. If we create them proactively, we’re likely wasting lot of memory. Creating them on demand is tricky too (especially without losing some accounting accuracy). > > > > I told about global (i.e. non-memcg) precise slab counters only. > > I'm expect it can done under new config option and/or static key, and if present use them in /proc/slabinfo output. > > > > At present I'm still going to extract memcg counters via your memcg_slabinfo script. > > I'm not sure I'll be able to debug this patch properly and decided to submit it as is. > I hope it can be useful. > > In general it works and /proc/slabinfo shows reasonable numbers, > however in some cases they differs from crash' "kmem -s" output, either +1 or -1. > Obviously I missed something. > Oh, sorry for the noise. You implemented what Roman said. So s->cpu_slab->inuse is just per-cpu counters for every object of a cache, not cpu slab. Please ignore my last feedback. Anyway, I think the +1 or -1 difference is due to race? What was your preemption model? > ---[cut here]--- > [PATCH RFC] slub: precise in-use counter for /proc/slabinfo output > > Signed-off-by: Vasily Averin <vvs@xxxxxxxxxxxxx> > --- > include/linux/slub_def.h | 3 +++ > init/Kconfig | 7 +++++++ > mm/slub.c | 20 +++++++++++++++++++- > 3 files changed, 29 insertions(+), 1 deletion(-) > > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > index 33c5c0e3bd8d..d22e18dfe905 100644 > --- a/include/linux/slub_def.h > +++ b/include/linux/slub_def.h > @@ -56,6 +56,9 @@ struct kmem_cache_cpu { > #ifdef CONFIG_SLUB_STATS > unsigned stat[NR_SLUB_STAT_ITEMS]; > #endif > +#ifdef CONFIG_SLUB_PRECISE_INUSE > + unsigned inuse; /* Precise in-use counter */ > +#endif > }; > #ifdef CONFIG_SLUB_CPU_PARTIAL > diff --git a/init/Kconfig b/init/Kconfig > index e9119bf54b1f..5c57bdbb8938 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1995,6 +1995,13 @@ config SLUB_CPU_PARTIAL > which requires the taking of locks that may cause latency spikes. > Typically one would choose no for a realtime system. > +config SLUB_PRECISE_INUSE > + default n > + depends on SLUB && SMP > + bool "SLUB precise in-use counter" > + help > + Per cpu in-use counter shows precise statistic in slabinfo. > + > config MMAP_ALLOW_UNINITIALIZED > bool "Allow mmapped anonymous memory to be uninitialized" > depends on EXPERT && !MMU > diff --git a/mm/slub.c b/mm/slub.c > index 261474092e43..90750cae0af9 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3228,6 +3228,9 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, > out: > slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init); > +#ifdef CONFIG_SLUB_PRECISE_INUSE > + raw_cpu_inc(s->cpu_slab->inuse); > +#endif > return object; > } > @@ -3506,8 +3509,12 @@ static __always_inline void slab_free(struct kmem_cache *s, struct slab *slab, > * With KASAN enabled slab_free_freelist_hook modifies the freelist > * to remove objects, whose reuse must be delayed. > */ > - if (slab_free_freelist_hook(s, &head, &tail, &cnt)) > + if (slab_free_freelist_hook(s, &head, &tail, &cnt)) { > do_slab_free(s, slab, head, tail, cnt, addr); > +#ifdef CONFIG_SLUB_PRECISE_INUSE > + raw_cpu_sub(s->cpu_slab->inuse, cnt); > +#endif > + } > } > #ifdef CONFIG_KASAN_GENERIC > @@ -6253,6 +6260,17 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo) > nr_free += count_partial(n, count_free); > } > +#ifdef CONFIG_SLUB_PRECISE_INUSE > + { > + unsigned int cpu, nr_inuse = 0; > + > + for_each_possible_cpu(cpu) > + nr_inuse += per_cpu_ptr((s)->cpu_slab, cpu)->inuse; > + > + if (nr_inuse <= nr_objs) > + nr_free = nr_objs - nr_inuse; > + } > +#endif > sinfo->active_objs = nr_objs - nr_free; > sinfo->num_objs = nr_objs; > sinfo->active_slabs = nr_slabs; > -- > 2.25.1 -- Thank you, You are awesome! Hyeonggon :-)