On Fri, Jan 5, 2024 at 1:29 AM Oliver Sang <oliver.sang@xxxxxxxxx> wrote: > > hi, Yang Shi, > > On Thu, Jan 04, 2024 at 04:39:50PM +0800, Oliver Sang wrote: > > hi, Fengwei, hi, Yang Shi, > > > > On Thu, Jan 04, 2024 at 04:18:00PM +0800, Yin Fengwei wrote: > > > > > > On 2024/1/4 09:32, Yang Shi wrote: > > > > ... > > > > > > Can you please help test the below patch? > > > I can't access the testing box now. Oliver will help to test your patch. > > > > > > > since now the commit-id of > > 'mm: align larger anonymous mappings on THP boundaries' > > in linux-next/master is efa7df3e3bb5d > > I applied the patch like below: > > > > * d8d7b1dae6f03 fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi > > * efa7df3e3bb5d mm: align larger anonymous mappings on THP boundaries > > * 1803d0c5ee1a3 mailmap: add an old address for Naoya Horiguchi > > > > our auto-bisect captured new efa7df3e3b as fbc for quite a number of regression > > so far, I will test d8d7b1dae6f03 for all these tests. Thanks > > > Hi Oliver, Thanks for running the test. Please see the inline comments. > we got 12 regressions and 1 improvement results for efa7df3e3b so far. > (4 regressions are just similar to what we reported for 1111d46b5c). > by your patch, 6 of those regressions are fixed, others are not impacted. > > below is a summary: > > No. testsuite test status-on-efa7df3e3b fix-by-d8d7b1dae6 ? > === ========= ==== ==================== =================== > (1) stress-ng numa regression NO > (2) pthread regression yes (on a Ice Lake server) > (3) pthread regression yes (on a Cascade Lake desktop) > (4) will-it-scale malloc1 regression NO I think this was reported earlier when Rik submitted the patch in the first place. IIRC, Huang Ying did some analysis on this one and thought is can be ignored. > (5) page_fault1 improvement no (so still improvement) > (6) vm-scalability anon-w-seq-mt regression yes > (7) stream nr_threads=25% regression yes > (8) nr_threads=50% regression yes > (9) phoronix osbench.CreateThreads regression yes (on a Cascade Lake server) > (10) ramspeed.Add.Integer regression NO (and below 3, on a Coffee Lake desktop) > (11) ramspeed.Average.FloatingPoint regression NO > (12) ramspeed.Triad.Integer regression NO > (13) ramspeed.Average.Integer regression NO Not fixing the ramspeed regression is expected. But it seems like both I and Fengwei can't reproduce the regression with running ramspeed alone. > > > below are details, for those regressions not fixed by d8d7b1dae6, attached > full comparison. > > > (1) detail comparison is attached as 'stress-ng-regression' > > Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G > ========================================================================================= > class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > cpu/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/numa/stress-ng/60s > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 251.12 -48.2% 130.00 -47.9% 130.75 stress-ng.numa.ops > 4.10 -49.4% 2.08 -49.2% 2.09 stress-ng.numa.ops_per_sec This is a new one. I did some analysis, it seems like it is not related to the THP patch since I can reproduce it on the kernel (on aarch64 VM) w/o the THP patch if I set THP to always. The profiling showed the regression was caused by move_pages() syscall. The test actually calls a bunch of NUMA syscalls, for example, set_mempolicy(), mbind(), move_pages(), migrate_pages(), etc, with different parameters. When calling move_pages() it tries to move pages (at base page granularity) to different nodes in a circular list. On my 2-node NUMA VM, it actually moves: 0th page to node #1 1st page to node #0 2nd page to node #1 3rd page to node #0 .... 1023rd page to node #0 But for THP, it actually bounces the THP between the two nodes for 512 times. The pgmigrate_success counter in /proc/vmstat also reflected the case: For base page, the delta is 1928431, but for THP case the delta is 218466402. The kernel already did the node check to kip move if the page is already on the target node, but the test case just do the bounce on purpose since it just assumes base page. So I think this case should be run with THP disabled. > > > (2) > Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G > ========================================================================================= > class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/pthread/stress-ng/60s > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 3272223 -87.8% 400430 +0.5% 3287322 stress-ng.pthread.ops > 54516 -87.8% 6664 +0.5% 54772 stress-ng.pthread.ops_per_sec > > > (3) > Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with memory: 128G > ========================================================================================= > class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 2250845 -85.2% 332370 ± 6% -0.8% 2232820 stress-ng.pthread.ops > 37510 -85.2% 5538 ± 6% -0.8% 37209 stress-ng.pthread.ops_per_sec > > > (4) full comparison attached as 'will-it-scale-regression' > > Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G > ========================================================================================= > compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/malloc1/will-it-scale > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 10994 -86.7% 1466 -86.7% 1460 will-it-scale.per_process_ops > 1231431 -86.7% 164315 -86.7% 163624 will-it-scale.workload > > > (5) > Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G > ========================================================================================= > compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/page_fault1/will-it-scale > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 18858970 +44.8% 27298921 +44.9% 27330479 will-it-scale.224.threads > 56.06 +13.3% 63.53 +13.8% 63.81 will-it-scale.224.threads_idle > 84191 +44.8% 121869 +44.9% 122010 will-it-scale.per_thread_ops > 18858970 +44.8% 27298921 +44.9% 27330479 will-it-scale.workload > > > (6) > Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G > ========================================================================================= > compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/8T/lkp-cpl-4sp2/anon-w-seq-mt/vm-scalability > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 345968 -6.5% 323566 +0.1% 346304 vm-scalability.median > 1.91 ± 10% -0.5 1.38 ± 20% -0.2 1.75 ± 13% vm-scalability.median_stddev% > 79708409 -7.4% 73839640 -0.1% 79613742 vm-scalability.throughput > > > (7) > Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G > ========================================================================================= > array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase: > 50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/25%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 349414 -16.2% 292854 ± 2% -0.4% 348048 stream.add_bandwidth_MBps > 347727 ± 2% -16.5% 290470 ± 2% -0.6% 345750 ± 2% stream.add_bandwidth_MBps_harmonicMean > 332206 -21.6% 260428 ± 3% -0.4% 330838 stream.copy_bandwidth_MBps > 330746 ± 2% -22.6% 255915 ± 3% -0.6% 328725 ± 2% stream.copy_bandwidth_MBps_harmonicMean > 301178 -16.9% 250209 ± 2% -0.4% 299920 stream.scale_bandwidth_MBps > 300262 -17.7% 247151 ± 2% -0.6% 298586 ± 2% stream.scale_bandwidth_MBps_harmonicMean > 337408 -12.5% 295287 ± 2% -0.3% 336304 stream.triad_bandwidth_MBps > 336153 -12.7% 293621 -0.5% 334624 ± 2% stream.triad_bandwidth_MBps_harmonicMean > > > (8) > Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G > ========================================================================================= > array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase: > 50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/50%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 345632 -19.7% 277550 ± 3% +0.4% 347067 ± 2% stream.add_bandwidth_MBps > 342263 ± 2% -19.7% 274704 ± 2% +0.4% 343609 ± 2% stream.add_bandwidth_MBps_harmonicMean > 343820 -17.3% 284428 ± 3% +0.1% 344248 stream.copy_bandwidth_MBps > 341759 ± 2% -17.8% 280934 ± 3% +0.1% 342025 ± 2% stream.copy_bandwidth_MBps_harmonicMean > 343270 -17.8% 282330 ± 3% +0.3% 344276 ± 2% stream.scale_bandwidth_MBps > 340812 ± 2% -18.3% 278284 ± 3% +0.3% 341672 ± 2% stream.scale_bandwidth_MBps_harmonicMean > 364596 -19.7% 292831 ± 3% +0.4% 366145 ± 2% stream.triad_bandwidth_MBps > 360643 ± 2% -19.9% 289034 ± 3% +0.4% 362004 ± 2% stream.triad_bandwidth_MBps_harmonicMean > > > (9) > Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with memory: 512G > ========================================================================================= > compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/Create Threads/debian-x86_64-phoronix/lkp-csl-2sp7/osbench-1.0.2/phoronix-test-suite > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 26.82 +1348.4% 388.43 +4.0% 27.88 phoronix-test-suite.osbench.CreateThreads.us_per_event > > > **** for below (10) - (13), full comparison is attached as phoronix-regressions > (they all happen on a Coffee Lake desktop) > (10) > Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G > ========================================================================================= > compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/Add/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 20115 -4.5% 19211 -4.5% 19217 phoronix-test-suite.ramspeed.Add.Integer.mb_s > > > (11) > Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G > ========================================================================================= > compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 19960 -2.9% 19378 -3.0% 19366 phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s > > > (12) > Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G > ========================================================================================= > compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/Triad/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 19667 -6.4% 18399 -6.4% 18413 phoronix-test-suite.ramspeed.Triad.Integer.mb_s > > > (13) > Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G > ========================================================================================= > compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 19799 -3.5% 19106 -3.4% 19117 phoronix-test-suite.ramspeed.Average.Integer.mb_s > > > > > > > > > commit d8d7b1dae6f0311d528b289cda7b317520f9a984 > > Author: 0day robot <lkp@xxxxxxxxx> > > Date: Thu Jan 4 12:51:10 2024 +0800 > > > > fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi > > > > diff --git a/include/linux/mman.h b/include/linux/mman.h > > index 40d94411d4920..91197bd387730 100644 > > --- a/include/linux/mman.h > > +++ b/include/linux/mman.h > > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) > > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | > > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | > > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | > > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | > > arch_calc_vm_flag_bits(flags); > > } > > > > > > > > > > Regards > > > Yin, Fengwei > > > > > > > > > > > diff --git a/include/linux/mman.h b/include/linux/mman.h > > > > index 40d94411d492..dc7048824be8 100644 > > > > --- a/include/linux/mman.h > > > > +++ b/include/linux/mman.h > > > > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) > > > > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | > > > > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | > > > > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | > > > > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | > > > > arch_calc_vm_flag_bits(flags); > > > > } > > > >