On Fri, 17 Jun 2022, Rongwei Wang wrote: > Christoph, I refer [1] to test some data below. The slub_test case is same to > your provided. And here you the result of its test (the baseline is the data > of upstream kernel, and fix is results of patched kernel). Ah good. > Single thread testing > > 1. Kmalloc: Repeatedly allocate then free test > > before (baseline) fix > kmalloc kfree kmalloc kfree > 10000 times 8 7 cycles 8 cycles 5 cycles 7 cycles > 10000 times 16 4 cycles 8 cycles 3 cycles 6 cycles > 10000 times 32 4 cycles 8 cycles 3 cycles 6 cycles Well the cycle reduction is strange. Tests are not done in the same environment? Maybe good to not use NUMA or bind to the same cpu > 10000 times 64 3 cycles 8 cycles 3 cycles 6 cycles > 10000 times 128 3 cycles 8 cycles 3 cycles 6 cycles > 10000 times 256 12 cycles 8 cycles 11 cycles 7 cycles > 10000 times 512 27 cycles 10 cycles 23 cycles 11 cycles > 10000 times 1024 18 cycles 9 cycles 20 cycles 10 cycles > 10000 times 2048 54 cycles 12 cycles 54 cycles 12 cycles > 10000 times 4096 105 cycles 20 cycles 105 cycles 25 cycles > 10000 times 8192 210 cycles 35 cycles 212 cycles 39 cycles > 10000 times 16384 133 cycles 45 cycles 119 cycles 46 cycles Seems to be different environments. > According to the above data, It seems that no significant performance > degradation in patched kernel. Plus, in concurrent allocs test, likes Kmalloc > N*alloc N*free(1024), the data of 'fix' column is better than baseline (it > looks less is better, if I am wrong, please let me know). And if you have > other suggestions, I can try to test more data. Well can you explain the cycle reduction?