On Thu, Jun 01, 2023 at 05:24:00AM -0400, chenzhiyin wrote: > In the syscall test of UnixBench, performance regression occurred due > to false sharing. > > The lock and atomic members, including file::f_lock, file::f_count and > file::f_pos_lock are highly contended and frequently updated in the > high-concurrency test scenarios. perf c2c indentified one affected > read access, file::f_op. > To prevent false sharing, the layout of file struct is changed as > following > (A) f_lock, f_count and f_pos_lock are put together to share the same > cache line. > (B) The read mostly members, including f_path, f_inode, f_op are put > into a separate cache line. > (C) f_mode is put together with f_count, since they are used frequently > at the same time. > Due to '__randomize_layout' attribute of file struct, the updated layout > only can be effective when CONFIG_RANDSTRUCT_NONE is 'y'. > > The optimization has been validated in the syscall test of UnixBench. > performance gain is 30~50%. Furthermore, to confirm the optimization > effectiveness on the other codes path, the results of fsdisk, fsbuffer > and fstime are also shown. > > Here are the detailed test results of unixbench. > > Command: numactl -C 3-18 ./Run -c 16 syscall fsbuffer fstime fsdisk > > Without Patch > ------------------------------------------------------------------------ > File Copy 1024 bufsize 2000 maxblocks 875052.1 KBps (30.0 s, 2 samples) > File Copy 256 bufsize 500 maxblocks 235484.0 KBps (30.0 s, 2 samples) > File Copy 4096 bufsize 8000 maxblocks 2815153.5 KBps (30.0 s, 2 samples) > System Call Overhead 5772268.3 lps (10.0 s, 7 samples) > > System Benchmarks Partial Index BASELINE RESULT INDEX > File Copy 1024 bufsize 2000 maxblocks 3960.0 875052.1 2209.7 > File Copy 256 bufsize 500 maxblocks 1655.0 235484.0 1422.9 > File Copy 4096 bufsize 8000 maxblocks 5800.0 2815153.5 4853.7 > System Call Overhead 15000.0 5772268.3 3848.2 > ======== > System Benchmarks Index Score (Partial Only) 2768.3 > > With Patch > ------------------------------------------------------------------------ > File Copy 1024 bufsize 2000 maxblocks 1009977.2 KBps (30.0 s, 2 samples) > File Copy 256 bufsize 500 maxblocks 264765.9 KBps (30.0 s, 2 samples) > File Copy 4096 bufsize 8000 maxblocks 3052236.0 KBps (30.0 s, 2 samples) > System Call Overhead 8237404.4 lps (10.0 s, 7 samples) > > System Benchmarks Partial Index BASELINE RESULT INDEX > File Copy 1024 bufsize 2000 maxblocks 3960.0 1009977.2 2550.4 > File Copy 256 bufsize 500 maxblocks 1655.0 264765.9 1599.8 > File Copy 4096 bufsize 8000 maxblocks 5800.0 3052236.0 5262.5 > System Call Overhead 15000.0 8237404.4 5491.6 > ======== > System Benchmarks Index Score (Partial Only) 3295.3 > > Signed-off-by: chenzhiyin <zhiyin.chen@xxxxxxxxx> > --- Dave had some more concerns and perf analysis requests for this. So this will be put on hold until these are addressed.