On Wed, May 31, 2023 at 10:31:09AM +0000, Chen, Zhiyin wrote: > As Eric said, CONFIG_RANDSTRUCT_NONE is set in the default config > and some production environments, including Ali Cloud. Therefore, it > is worthful to optimize the file struct layout. > > Here are the syscall test results of unixbench. Results look good, but the devil is in the detail.... > Command: numactl -C 3-18 ./Run -c 16 syscall So the test is restricted to a set of adjacent cores within a single CPU socket, so all the cachelines are typically being shared within a single socket's CPU caches. IOWs, the fact there are 224 CPUs in the machine is largely irrelevant for this microbenchmark. i.e. is this a microbenchmark that is going faster simply because the working set for the specific benchmark now fits in L2 or L3 cache when it didn't before? Does this same result occur for different CPUs types, cache sizes and architectures? What about when the cores used by the benchmark are spread across mulitple sockets so the cost of remote cacheline access is taken into account? If this is actually a real benefit, then we should see similar or even larger gains between CPU cores that are further apart because the cost of false cacheline sharing are higher in those systems.... > Without patch > ------------------------ > 224 CPUs in system; running 16 parallel copies of tests > System Call Overhead 5611223.7 lps (10.0 s, 7 samples) > System Benchmarks Partial Index BASELINE RESULT INDEX > System Call Overhead 15000.0 5611223.7 3740.8 > ======== > System Benchmarks Index Score (Partial Only) 3740.8 > > With patch > ------------------------------------------------------------------------ > 224 CPUs in system; running 16 parallel copies of tests > System Call Overhead 7567076.6 lps (10.0 s, 7 samples) > System Benchmarks Partial Index BASELINE RESULT INDEX > System Call Overhead 15000.0 7567076.6 5044.7 > ======== > System Benchmarks Index Score (Partial Only) 5044.7 Where is all this CPU time being saved? Do you have a profile showing what functions in the kernel are running far more efficiently now? Yes, the results look good, but if all this change is doing is micro-optimising a single code path, it's much less impressive and far more likley that it has no impact on real-world performance... More information, please! -Dave. -- Dave Chinner david@xxxxxxxxxxxxx