Re: [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression

Oliver Sang <oliver.sang@xxxxxxxxx> · Thu, 30 May 2024 14:17:39 +0800

hi, Shakeel,

On Mon, May 27, 2024 at 11:30:38PM -0700, Shakeel Butt wrote:
> On Fri, May 24, 2024 at 11:06:54AM GMT, Shakeel Butt wrote:
> > On Fri, May 24, 2024 at 03:45:54PM +0800, Oliver Sang wrote:
> [...]
> > I will re-run my experiments on linus tree and report back.
> 
> I am not able to reproduce the regression with the fix I have proposed,
> at least on my 1 node 52 CPUs (Cooper Lake) and 2 node 80 CPUs (Skylake)
> machines. Let me give more details below:
> 
> Setup instructions:
> -------------------
> mount -t tmpfs tmpfs /tmp
> mkdir -p /sys/fs/cgroup/A
> mkdir -p /sys/fs/cgroup/A/B
> mkdir -p /sys/fs/cgroup/A/B/C
> echo +memory > /sys/fs/cgroup/A/cgroup.subtree_control
> echo +memory > /sys/fs/cgroup/A/B/cgroup.subtree_control
> echo $$ > /sys/fs/cgroup/A/B/C/cgroup.procs
> 
> The base case (commit a4c43b8a0980):
> ------------------------------------
> $ python3 ./runtest.py page_fault2 295 process 0 0 52
> tasks,processes,processes_idle,threads,threads_idle,linear
> 0,0,100,0,100,0
> 52,2796769,0.03,0,0.00,0
> 
> $ python3 ./runtest.py page_fault2 295 process 0 0 80
> tasks,processes,processes_idle,threads,threads_idle,linear
> 0,0,100,0,100,0
> 80,6755010,0.04,0,0.00,0
> 
> 
> The regressing series (last commit a94032b35e5f)
> ------------------------------------------------
> $ python3 ./runtest.py page_fault2 295 process 0 0 52
> tasks,processes,processes_idle,threads,threads_idle,linear
> 0,0,100,0,100,0
> 52,2684859,0.03,0,0.00,0
> 
> $ python3 ./runtest.py page_fault2 295 process 0 0 80
> tasks,processes,processes_idle,threads,threads_idle,linear
> 0,0,100,0,100,0
> 80,6010438,0.13,0,0.00,0
> 
> The fix on top of regressing series:
> ------------------------------------
> $ python3 ./runtest.py page_fault2 295 process 0 0 52
> tasks,processes,processes_idle,threads,threads_idle,linear
> 0,0,100,0,100,0
> 52,3812133,0.02,0,0.00,0
> 
> $ python3 ./runtest.py page_fault2 295 process 0 0 80
> tasks,processes,processes_idle,threads,threads_idle,linear
> 0,0,100,0,100,0
> 80,7979893,0.15,0,0.00,0
> 
> 
> As you can see, the fix is improving the performance over the base, at
> least for me. I can only speculate that either the difference of
> hardware is giving us different results (you have newer CPUs) or there
> is still disparity of experiment setup/environment between us.
> 
> Are you disabling hyperthreading? Is the prefetching heuristics
> different on your systems?

we don't disable hyperthreading.

for prefetching, we don't change bios default setting. for the skl server
in our original report:

MLC Spatial Prefetcher     - enabled
DCU Data Prefetcher        - enabled
DCU Instruction Prefetcher - enabled
LLC Prefetch               - disabled

but we don't uniform these setting for all our servers. such like for that
Ice Lake server mentioned in previous mail, the "LLC Prefetch" is default
to be enabled, so we keep it as enabled.

> 
> Regarding test environment, can you check my setup instructions above
> and see if I am doing something wrong or different?
> 
> At the moment, I am inclined towards asking Andrew to include my fix in
> following 6.10-rc* but keep this report open, so we continue to improve.
> Let me know if you have concerns.

yeah, different setup/environment could cause difference. anyway, when your
fix merged, we could capture it for some performance improvement. or if you
want us a manual check, you could let us know. Thanks!

> 
> thanks,
> Shakeel