On 25.02.2022 07:37, Vasily Averin wrote:
On 25.02.2022 03:08, Roman Gushchin wrote:
On Feb 24, 2022, at 5:17 AM, Vasily Averin <vvs@xxxxxxxxxxxxx> wrote:
On 22.02.2022 19:32, Shakeel Butt wrote:
If you are just interested in the stats, you can use SLAB for your experiments.
Unfortunately memcg_slabino.py does not support SLAB right now.
On 23.02.2022 20:31, Vlastimil Babka wrote:
On 2/23/22 04:45, Hyeonggon Yoo wrote:
On Wed, Feb 23, 2022 at 01:32:36AM +0100, Vlastimil Babka wrote:
Hm it would be easier just to disable merging when the precise counters are
enabled. Assume it would be a config option (possibly boot-time option with
static keys) anyway so those who don't need them can avoid the overhead.
Is it possible to accurately account objects in SLUB? I think it's not
easy because a CPU can free objects to remote cpu's partial slabs using
cmpxchg_double()...
AFAIU Roman's idea would be that each alloc/free would simply inc/dec an
object counter that's disconnected from physical handling of particular sl*b
implementation. It would provide exact count of objects from the perspective
of slab users.
I assume for reduced overhead the counters would be implemented in a percpu
fashion as e.g. vmstats. Slabinfo gathering would thus have to e.g. sum up
those percpu counters.
I like this idea too and I'm going to spend some time for its implementation.
Sounds good!
Unfortunately it’s quite tricky: the problem is that there is potentially a large and dynamic set of cgroups and also large and dynamic set of slab caches. Given the performance considerations, it’s also unlikely to avoid using percpu variables.
So we come to the (nr_slab_caches * nr_cgroups * nr_cpus) number of “objects”. If we create them proactively, we’re likely wasting lot of memory. Creating them on demand is tricky too (especially without losing some accounting accuracy).
I told about global (i.e. non-memcg) precise slab counters only.
I'm expect it can done under new config option and/or static key, and if present use them in /proc/slabinfo output.
At present I'm still going to extract memcg counters via your memcg_slabinfo script.
I'm not sure I'll be able to debug this patch properly and decided to submit it as is.
I hope it can be useful.
In general it works and /proc/slabinfo shows reasonable numbers,
however in some cases they differs from crash' "kmem -s" output, either +1 or -1.
Obviously I missed something.
---[cut here]---
[PATCH RFC] slub: precise in-use counter for /proc/slabinfo output
Signed-off-by: Vasily Averin <vvs@xxxxxxxxxxxxx>
---
include/linux/slub_def.h | 3 +++
init/Kconfig | 7 +++++++
mm/slub.c | 20 +++++++++++++++++++-
3 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 33c5c0e3bd8d..d22e18dfe905 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -56,6 +56,9 @@ struct kmem_cache_cpu {
#ifdef CONFIG_SLUB_STATS
unsigned stat[NR_SLUB_STAT_ITEMS];
#endif
+#ifdef CONFIG_SLUB_PRECISE_INUSE
+ unsigned inuse; /* Precise in-use counter */
+#endif
};
#ifdef CONFIG_SLUB_CPU_PARTIAL
diff --git a/init/Kconfig b/init/Kconfig
index e9119bf54b1f..5c57bdbb8938 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1995,6 +1995,13 @@ config SLUB_CPU_PARTIAL
which requires the taking of locks that may cause latency spikes.
Typically one would choose no for a realtime system.
+config SLUB_PRECISE_INUSE
+ default n
+ depends on SLUB && SMP
+ bool "SLUB precise in-use counter"
+ help
+ Per cpu in-use counter shows precise statistic in slabinfo.
+
config MMAP_ALLOW_UNINITIALIZED
bool "Allow mmapped anonymous memory to be uninitialized"
depends on EXPERT && !MMU
diff --git a/mm/slub.c b/mm/slub.c
index 261474092e43..90750cae0af9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3228,6 +3228,9 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
out:
slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init);
+#ifdef CONFIG_SLUB_PRECISE_INUSE
+ raw_cpu_inc(s->cpu_slab->inuse);
+#endif
return object;
}
@@ -3506,8 +3509,12 @@ static __always_inline void slab_free(struct kmem_cache *s, struct slab *slab,
* With KASAN enabled slab_free_freelist_hook modifies the freelist
* to remove objects, whose reuse must be delayed.
*/
- if (slab_free_freelist_hook(s, &head, &tail, &cnt))
+ if (slab_free_freelist_hook(s, &head, &tail, &cnt)) {
do_slab_free(s, slab, head, tail, cnt, addr);
+#ifdef CONFIG_SLUB_PRECISE_INUSE
+ raw_cpu_sub(s->cpu_slab->inuse, cnt);
+#endif
+ }
}
#ifdef CONFIG_KASAN_GENERIC
@@ -6253,6 +6260,17 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
nr_free += count_partial(n, count_free);
}
+#ifdef CONFIG_SLUB_PRECISE_INUSE
+ {
+ unsigned int cpu, nr_inuse = 0;
+
+ for_each_possible_cpu(cpu)
+ nr_inuse += per_cpu_ptr((s)->cpu_slab, cpu)->inuse;
+
+ if (nr_inuse <= nr_objs)
+ nr_free = nr_objs - nr_inuse;
+ }
+#endif
sinfo->active_objs = nr_objs - nr_free;
sinfo->num_objs = nr_objs;
sinfo->active_slabs = nr_slabs;
--
2.25.1