hi, Shakeel, On Mon, May 27, 2024 at 11:30:38PM -0700, Shakeel Butt wrote: > On Fri, May 24, 2024 at 11:06:54AM GMT, Shakeel Butt wrote: > > On Fri, May 24, 2024 at 03:45:54PM +0800, Oliver Sang wrote: > [...] > > I will re-run my experiments on linus tree and report back. > > I am not able to reproduce the regression with the fix I have proposed, > at least on my 1 node 52 CPUs (Cooper Lake) and 2 node 80 CPUs (Skylake) > machines. Let me give more details below: > > Setup instructions: > ------------------- > mount -t tmpfs tmpfs /tmp > mkdir -p /sys/fs/cgroup/A > mkdir -p /sys/fs/cgroup/A/B > mkdir -p /sys/fs/cgroup/A/B/C > echo +memory > /sys/fs/cgroup/A/cgroup.subtree_control > echo +memory > /sys/fs/cgroup/A/B/cgroup.subtree_control > echo $$ > /sys/fs/cgroup/A/B/C/cgroup.procs > > The base case (commit a4c43b8a0980): > ------------------------------------ > $ python3 ./runtest.py page_fault2 295 process 0 0 52 > tasks,processes,processes_idle,threads,threads_idle,linear > 0,0,100,0,100,0 > 52,2796769,0.03,0,0.00,0 > > $ python3 ./runtest.py page_fault2 295 process 0 0 80 > tasks,processes,processes_idle,threads,threads_idle,linear > 0,0,100,0,100,0 > 80,6755010,0.04,0,0.00,0 > > > The regressing series (last commit a94032b35e5f) > ------------------------------------------------ > $ python3 ./runtest.py page_fault2 295 process 0 0 52 > tasks,processes,processes_idle,threads,threads_idle,linear > 0,0,100,0,100,0 > 52,2684859,0.03,0,0.00,0 > > $ python3 ./runtest.py page_fault2 295 process 0 0 80 > tasks,processes,processes_idle,threads,threads_idle,linear > 0,0,100,0,100,0 > 80,6010438,0.13,0,0.00,0 > > The fix on top of regressing series: > ------------------------------------ > $ python3 ./runtest.py page_fault2 295 process 0 0 52 > tasks,processes,processes_idle,threads,threads_idle,linear > 0,0,100,0,100,0 > 52,3812133,0.02,0,0.00,0 > > $ python3 ./runtest.py page_fault2 295 process 0 0 80 > tasks,processes,processes_idle,threads,threads_idle,linear > 0,0,100,0,100,0 > 80,7979893,0.15,0,0.00,0 > > > As you can see, the fix is improving the performance over the base, at > least for me. I can only speculate that either the difference of > hardware is giving us different results (you have newer CPUs) or there > is still disparity of experiment setup/environment between us. > > Are you disabling hyperthreading? Is the prefetching heuristics > different on your systems? we don't disable hyperthreading. for prefetching, we don't change bios default setting. for the skl server in our original report: MLC Spatial Prefetcher - enabled DCU Data Prefetcher - enabled DCU Instruction Prefetcher - enabled LLC Prefetch - disabled but we don't uniform these setting for all our servers. such like for that Ice Lake server mentioned in previous mail, the "LLC Prefetch" is default to be enabled, so we keep it as enabled. > > Regarding test environment, can you check my setup instructions above > and see if I am doing something wrong or different? > > At the moment, I am inclined towards asking Andrew to include my fix in > following 6.10-rc* but keep this report open, so we continue to improve. > Let me know if you have concerns. yeah, different setup/environment could cause difference. anyway, when your fix merged, we could capture it for some performance improvement. or if you want us a manual check, you could let us know. Thanks! > > thanks, > Shakeel