On Fri, Jun 9, 2023 at 7:04 AM Marc Zyngier <maz@xxxxxxxxxx> wrote: > > On Fri, 09 Jun 2023 01:59:35 +0100, > Yu Zhao <yuzhao@xxxxxxxxxx> wrote: > > > > TLDR > > ==== > > Apache Spark spent 12% less time sorting four billion random integers twenty times (in ~4 hours) after this patchset [1]. > > Why are the 3 architectures you have considered being evaluated with 3 > different benchmarks? I was hoping people having special interests in different archs might try to reproduce the benchmarks that I didn't report (but did cover) and see what happens. > I am not suspecting you to have cherry-picked > the best results I'm generally very conservative when reporting *synthetic* results. For example, the same memcached benchmark used on powerpc yielded >50% improvement on aarch64, because the default Ubuntu Kconfig uses 64KB base page size for powerpc but 4KB for aarch64. (Before the series, the reclaim (swap) path takes kvm->mmu_lock for *write* on O(nr of all pages to consider); after the series, it becomes O(actual nr of pages to swap), which is <10% given how the benchmark was set up.) Ops/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency ------------------------------------------------------------------------ Before 639511.40 0.09940 0.04700 0.27100 22.52700 After 974184.60 0.06471 0.04700 0.15900 3.75900 > but I'd really like to see a variety of benchmarks > that exercise this stuff differently. I'd be happy to try other synthetic workloads that people think that are relatively representative. Also, I've backported the series and started an A/B experiment involving ~1 million devices (real-world workloads). We should have the preliminary results by the time I post the next version.