On 10/07/2023 13:05, Barry Song wrote: > On Thu, Jun 22, 2023 at 11:00 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: >> >> Hi All, >> [...] >> >> Performance >> ----------- >> >> Below results show 2 benchmarks; kernel compilation and speedometer 2.0 (a >> javascript benchmark running in Chromium). Both cases are running on Ampere >> Altra with 1 NUMA node enabled, Ubuntu 22.04 and XFS filesystem. Each benchmark >> is repeated 15 times over 5 reboots and averaged. >> >> All improvements are relative to baseline-4k. anonfolio and exefolio are as >> described above. contpte is this series. (Note that exefolio only gives an >> improvement because contpte is already in place). >> >> Kernel Compilation (smaller is better): >> >> | kernel | real-time | kern-time | user-time | >> |:-------------|------------:|------------:|------------:| >> | baseline-4k | 0.0% | 0.0% | 0.0% | >> | anonfolio | -5.4% | -46.0% | -0.3% | >> | contpte | -6.8% | -45.7% | -2.1% | >> | exefolio | -8.4% | -46.4% | -3.7% | > > sorry i am a bit confused. in exefolio case, is anonfolio included? > or it only has large cont-pte folios on exe code? in the other words, > Does the 8.4% improvement come from iTLB miss reduction only, > or from both dTLB and iTLB miss reduction? The anonfolio -> contpte -> exefolio results are incremental. So: anonfolio: baseline-4k + anonfolio changes contpte: anonfolio + contpte changes exefolio: contpte + exefolio changes So yes, exefolio includes anonfolio. Sorry for the confusion. > >> | baseline-16k | -8.7% | -49.2% | -3.7% | >> | baseline-64k | -10.5% | -66.0% | -3.5% | >> >> Speedometer 2.0 (bigger is better): >> >> | kernel | runs_per_min | >> |:-------------|---------------:| >> | baseline-4k | 0.0% | >> | anonfolio | 1.2% | >> | contpte | 3.1% | >> | exefolio | 4.2% | > > same question as above. same answer as above. Thanks, Ryan > >> | baseline-16k | 5.3% | >>