On 4/14/21 11:26 PM, Masayoshi Mizuma wrote:
Hi Longman,
Thank you for your patches.
I rerun the benchmark with your patches, it seems that the reduction
is small... The total duration of sendto() and recvfrom() system call
during the benchmark are as follows.
- sendto
- v5.8 vanilla: 2576.056 msec (100%)
- v5.12-rc7 vanilla: 2988.911 msec (116%)
- v5.12-rc7 with your patches (1-5): 2984.307 msec (115%)
- recvfrom
- v5.8 vanilla: 2113.156 msec (100%)
- v5.12-rc7 vanilla: 2305.810 msec (109%)
- v5.12-rc7 with your patches (1-5): 2287.351 msec (108%)
kmem_cache_alloc()/kmem_cache_free() are called around 1,400,000 times during
the benchmark. I ran a loop in a kernel module as following. The duration
is reduced by your patches actually.
---
dummy_cache = KMEM_CACHE(dummy, SLAB_ACCOUNT);
for (i = 0; i < 1400000; i++) {
p = kmem_cache_alloc(dummy_cache, GFP_KERNEL);
kmem_cache_free(dummy_cache, p);
}
---
- v5.12-rc7 vanilla: 110 msec (100%)
- v5.12-rc7 with your patches (1-5): 85 msec (77%)
It seems that the reduction is small for the benchmark though...
Anyway, I can see your patches reduce the overhead.
Please feel free to add:
Tested-by: Masayoshi Mizuma <m.mizuma@xxxxxxxxxxxxxx>
Thanks!
Masa
Thanks for the testing.
I was focusing on your kernel module benchmark in testing my patch. I
will try out your pgbench benchmark to see if there can be other tuning
that can be done.
BTW, how many numa nodes does your test machine? I did my testing with a
2-socket system. The vmstat caching part may be less effective on
systems with more numa nodes. I will try to find a larger 4-socket
systems for testing.
Cheers,
Longman