Re: [PATCH v3 0/5] mm/memcg: Reduce kmemcache memory accounting overhead

Waiman Long <llong@xxxxxxxxxx> · Thu, 15 Apr 2021 09:17:37 -0400

On 4/14/21 11:26 PM, Masayoshi Mizuma wrote:

Hi Longman,

Thank you for your patches.
I rerun the benchmark with your patches, it seems that the reduction
is small... The total duration of sendto() and recvfrom() system call
during the benchmark are as follows.

- sendto
   - v5.8 vanilla:                      2576.056 msec (100%)
   - v5.12-rc7 vanilla:                 2988.911 msec (116%)
   - v5.12-rc7 with your patches (1-5): 2984.307 msec (115%)

- recvfrom
   - v5.8 vanilla:                      2113.156 msec (100%)
   - v5.12-rc7 vanilla:                 2305.810 msec (109%)
   - v5.12-rc7 with your patches (1-5): 2287.351 msec (108%)

kmem_cache_alloc()/kmem_cache_free() are called around 1,400,000 times during
the benchmark. I ran a loop in a kernel module as following. The duration
is reduced by your patches actually.

   ---
   dummy_cache = KMEM_CACHE(dummy, SLAB_ACCOUNT);
   for (i = 0; i < 1400000; i++) {
	p = kmem_cache_alloc(dummy_cache, GFP_KERNEL);
	kmem_cache_free(dummy_cache, p);
   }
   ---

- v5.12-rc7 vanilla:                 110 msec (100%)
- v5.12-rc7 with your patches (1-5):  85 msec (77%)

It seems that the reduction is small for the benchmark though...
Anyway, I can see your patches reduce the overhead.
Please feel free to add:

	Tested-by: Masayoshi Mizuma <m.mizuma@xxxxxxxxxxxxxx>

Thanks!
Masa

Thanks for the testing.

I was focusing on your kernel module benchmark in testing my patch. I 
will try out your pgbench benchmark to see if there can be other tuning 
that can be done.

BTW, how many numa nodes does your test machine? I did my testing with a 
2-socket system. The vmstat caching part may be less effective on 
systems with more numa nodes. I will try to find a larger 4-socket 
systems for testing.

Cheers,
Longman