> On Aug 28, 2024, at 01:23, Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > > On Tue, Aug 27, 2024 at 03:06:32AM GMT, Roman Gushchin wrote: >> On Mon, Aug 26, 2024 at 04:29:08PM -0700, Shakeel Butt wrote: >>> At the moment, the slab objects are charged to the memcg at the >>> allocation time. However there are cases where slab objects are >>> allocated at the time where the right target memcg to charge it to is >>> not known. One such case is the network sockets for the incoming >>> connection which are allocated in the softirq context. >>> >>> Couple hundred thousand connections are very normal on large loaded >>> server and almost all of those sockets underlying those connections get >>> allocated in the softirq context and thus not charged to any memcg. >>> However later at the accept() time we know the right target memcg to >>> charge. Let's add new API to charge already allocated objects, so we can >>> have better accounting of the memory usage. >>> >>> To measure the performance impact of this change, tcp_crr is used from >>> the neper [1] performance suite. Basically it is a network ping pong >>> test with new connection for each ping pong. >>> >>> The server and the client are run inside 3 level of cgroup hierarchy >>> using the following commands: >>> >>> Server: >>> $ tcp_crr -6 >>> >>> Client: >>> $ tcp_crr -6 -c -H ${server_ip} >>> >>> If the client and server run on different machines with 50 GBPS NIC, >>> there is no visible impact of the change. >>> >>> For the same machine experiment with v6.11-rc5 as base. >>> >>> base (throughput) with-patch >>> tcp_crr 14545 (+- 80) 14463 (+- 56) >>> >>> It seems like the performance impact is within the noise. >>> >>> Link: https://github.com/google/neper [1] >>> Signed-off-by: Shakeel Butt <shakeel.butt@xxxxxxxxx> >> >> Hi Shakeel, >> >> I like the idea and performance numbers look good. However some comments on >> the implementation: >> > > Thanks for taking a look. > >>> --- >>> >>> Changes since the RFC: >>> - Added check for already charged slab objects. >>> - Added performance results from neper's tcp_crr >>> >>> include/linux/slab.h | 1 + >>> mm/slub.c | 54 +++++++++++++++++++++++++++++++++ >>> net/ipv4/inet_connection_sock.c | 5 +-- >>> 3 files changed, 58 insertions(+), 2 deletions(-) >>> >>> diff --git a/include/linux/slab.h b/include/linux/slab.h >>> index eb2bf4629157..05cfab107c72 100644 >>> --- a/include/linux/slab.h >>> +++ b/include/linux/slab.h >>> @@ -547,6 +547,7 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru, >>> gfp_t gfpflags) __assume_slab_alignment __malloc; >>> #define kmem_cache_alloc_lru(...) alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__)) >>> >>> +bool kmem_cache_charge(void *objp, gfp_t gfpflags); >>> void kmem_cache_free(struct kmem_cache *s, void *objp); >>> >>> kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags, >>> diff --git a/mm/slub.c b/mm/slub.c >>> index c9d8a2497fd6..580683597b5c 100644 >>> --- a/mm/slub.c >>> +++ b/mm/slub.c >>> @@ -2185,6 +2185,16 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, >>> >>> __memcg_slab_free_hook(s, slab, p, objects, obj_exts); >>> } >>> + >>> +static __fastpath_inline >>> +bool memcg_slab_post_charge(struct kmem_cache *s, void *p, gfp_t flags) >>> +{ >>> + if (likely(!memcg_kmem_online())) >>> + return true; >> >> We do have this check in kmem_cache_charge(), why do we need to check it again? >> > > I missed to remove this one. I am going to rearrange the code bit more > in these functions to avoid the build errors in non MEMCG builds. > >>> + >>> + return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p); >>> +} >>> + >>> #else /* CONFIG_MEMCG */ >>> static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s, >>> struct list_lru *lru, >>> @@ -2198,6 +2208,13 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, >>> void **p, int objects) >>> { >>> } >>> + >>> +static inline bool memcg_slab_post_charge(struct kmem_cache *s, >>> + void *p, >>> + gfp_t flags) >>> +{ >>> + return true; >>> +} >>> #endif /* CONFIG_MEMCG */ >>> >>> /* >>> @@ -4062,6 +4079,43 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru, >>> } >>> EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof); >>> >>> +#define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \ >>> + SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT) >>> + >>> +bool kmem_cache_charge(void *objp, gfp_t gfpflags) >>> +{ >>> + struct slabobj_ext *slab_exts; >>> + struct kmem_cache *s; >>> + struct folio *folio; >>> + struct slab *slab; >>> + unsigned long off; >>> + >>> + if (!memcg_kmem_online()) >>> + return true; >>> + >>> + folio = virt_to_folio(objp); >>> + if (unlikely(!folio_test_slab(folio))) >>> + return false; >> >> Does it handle the case of a too-big-to-be-a-slab-object allocation? >> I think it's better to handle it properly. Also, why return false here? >> > > Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I > should just follow the kfree() hanlding on !folio_test_slab() i.e. that > the given object is the large or too-big-to-be-a-slab-object. Hi Shakeel, If we decide to do this, I suppose you will use memcg_kmem_charge_page to charge big-object. To be consistent, I suggest renaming kmem_cache_charge to memcg_kmem_charge to handle both slab object and big-object. And I saw all the functions related to object charging is moved to memcontrol.c (e.g. __memcg_slab_post_alloc_hook), so maybe we should also do this for memcg_kmem_charge? Muhcun, Thanks.