On 6/17/22 10:19 PM, Christoph Lameter wrote:
On Fri, 17 Jun 2022, Rongwei Wang wrote:
Christoph, I refer [1] to test some data below. The slub_test case is same to
your provided. And here you the result of its test (the baseline is the data
of upstream kernel, and fix is results of patched kernel).
Ah good.
Single thread testing
1. Kmalloc: Repeatedly allocate then free test
before (baseline) fix
kmalloc kfree kmalloc kfree
10000 times 8 7 cycles 8 cycles 5 cycles 7 cycles
10000 times 16 4 cycles 8 cycles 3 cycles 6 cycles
10000 times 32 4 cycles 8 cycles 3 cycles 6 cycles
Well the cycle reduction is strange. Tests are not done in the same
environment? Maybe good to not use NUMA or bind to the same cpu
It's the same environment. I can sure. And there are four nodes (32G
per-node and 8 cores per-node) in my test environment. whether I need to
test in one node? If right, I can try.
10000 times 64 3 cycles 8 cycles 3 cycles 6 cycles
10000 times 128 3 cycles 8 cycles 3 cycles 6 cycles
10000 times 256 12 cycles 8 cycles 11 cycles 7 cycles
10000 times 512 27 cycles 10 cycles 23 cycles 11 cycles
10000 times 1024 18 cycles 9 cycles 20 cycles 10 cycles
10000 times 2048 54 cycles 12 cycles 54 cycles 12 cycles
10000 times 4096 105 cycles 20 cycles 105 cycles 25 cycles
10000 times 8192 210 cycles 35 cycles 212 cycles 39 cycles
10000 times 16384 133 cycles 45 cycles 119 cycles 46 cycles
Seems to be different environments.
According to the above data, It seems that no significant performance
degradation in patched kernel. Plus, in concurrent allocs test, likes Kmalloc
N*alloc N*free(1024), the data of 'fix' column is better than baseline (it
looks less is better, if I am wrong, please let me know). And if you have
other suggestions, I can try to test more data.
Well can you explain the cycle reduction?
Maybe because of four nodes in my system or only 8 cores (very small) in
each node? Thanks, you remind me that I need to increase core number of
each node or change node number to compere the results.
Thanks!