On Sat, 18 Jun 2022, Rongwei Wang wrote: > > Well the cycle reduction is strange. Tests are not done in the same > > environment? Maybe good to not use NUMA or bind to the same cpu > It's the same environment. I can sure. And there are four nodes (32G per-node > and 8 cores per-node) in my test environment. whether I need to test in one > node? If right, I can try. Ok in a NUMA environment the memory allocation is randomized on bootup. You may get different numbers after you reboot the system. Try to switch NUMA off. Use s a single node to get consistent numbers. It maybe useful to figure out what memory structure causes the increase in latency in a NUMA environment. If you can figure that out and properly allocate the memory structure that causes the increases in latency then you may be able to increase the performance of the allocator.