CC: Mel Gorman <mgorman@xxxxxxx> CC: linux@xxxxxxxxx Hi all, There's a huge regression found, which affects Intel Xeon's DMA Engine performance between v4.14 LTS and modern kernels. In certain circumstances the speed in dmatest is more than 6 times lower. - Hardware - I did testing on 2 systems: 1) Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz (Supermicro X11DAi-N) 2) Intel(R) Xeon(R) Bronze 3204 CPU @ 1.90GHz (YADRO Vegman S220) - Measurement - The dmatest result speed decreases with almost any test settings. Although the most significant impact is revealed with 64K transfers. The following parameters were used: modprobe dmatest iterations=1000 timeout=2000 test_buf_size=0x100000 transfer_size=0x10000 norandom=1 echo "dma0chan0" > /sys/module/dmatest/parameters/channel echo 1 > /sys/module/dmatest/parameters/run Every test csse was performed at least 3 times. All detailed results are below. - Analysis - Bisecting revealed 2 different bad commits for those 2 systems, but both change the same function/condition in the same file. For the system (1) the bad commit is: [7332dec055f2457c386032f7e9b2991eb05c2a0a] sched/fair: Only immediately migrate tasks due to interrupts if prev and target CPUs share cache For the system (2) the bad commit is: [806486c377e33ab662de6d47902e9e2a32b79368] sched/fair: Do not migrate if the prev_cpu is idle - Additional check - Attempting to revert the changes above, a dirty patch for the (current) kernel v5.16.0-rc5 was tested too: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6f16dfb74246..0a58cc00b1b8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5931,8 +5931,8 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync) * a cpufreq perspective, it's better to have higher utilisation * on one CPU. */ - if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu)) - return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu; + if (available_idle_cpu(this_cpu)) + return this_cpu; if (sync && cpu_rq(this_cpu)->nr_running == 1) return this_cpu; Please, take a look if this makes sense. But with this patch applied the performance of DMA Engine restores. - Dmatest results TL;DR - System (1) before bad commit: --------------------- [ 519.894642] dmatest: Added 1 threads using dma0chan0 [ 525.383021] dmatest: Started 1 threads using dma0chan0 [ 528.521915] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 98367.10 iops 6295494 KB/s (0) [ 544.851751] dmatest: Added 1 threads using dma0chan0 [ 546.460064] dmatest: Started 1 threads using dma0chan0 [ 549.609504] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 100310.96 iops 6419901 KB/s (0) [ 562.178365] dmatest: Added 1 threads using dma0chan0 [ 563.852534] dmatest: Started 1 threads using dma0chan0 [ 567.004898] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 98580.44 iops 6309148 KB/s (0) --------------------- System (1) on HEAD=bad commit: --------------------- [ 149.555401] dmatest: Added 1 threads using dma0chan0 [ 154.162444] dmatest: Started 1 threads using dma0chan0 [ 157.490868] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 26653.87 iops 1705847 KB/s (0) [ 176.783450] dmatest: Added 1 threads using dma0chan0 [ 178.428518] dmatest: Started 1 threads using dma0chan0 [ 181.606531] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 14194.86 iops 908471 KB/s (0) [ 192.125218] dmatest: Added 1 threads using dma0chan0 [ 194.060029] dmatest: Started 1 threads using dma0chan0 [ 197.235265] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 14757.09 iops 944454 KB/s (0) --------------------- Systen (1) on v5.16.0-rc5: --------------------- [ 1430.860170] dmatest: Added 1 threads using dma0chan0 [ 1437.367447] dmatest: Started 1 threads using dma0chan0 [ 1442.756660] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 24837.31 iops 1589588 KB/s (0) [ 1561.614191] dmatest: Added 1 threads using dma0chan0 [ 1562.816375] dmatest: Started 1 threads using dma0chan0 [ 1566.619614] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 13666.05 iops 874627 KB/s (0) [ 1585.019601] dmatest: Added 1 threads using dma0chan0 [ 1587.585741] dmatest: Started 1 threads using dma0chan0 [ 1591.386816] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 13521.91 iops 865402 KB/s (0) --------------------- System (1) on v5.16.0-rc5 with dirty patch: --------------------- [ 733.571508] dmatest: Added 1 threads using dma0chan0 [ 746.050800] dmatest: Started 1 threads using dma0chan0 [ 749.765600] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 87260.03 iops 5584642 KB/s (0) [ 915.051955] dmatest: Added 1 threads using dma0chan0 [ 916.550732] dmatest: Started 1 threads using dma0chan0 [ 920.267525] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 88464.25 iops 5661712 KB/s (0) [ 936.781273] dmatest: Added 1 threads using dma0chan0 [ 939.528616] dmatest: Started 1 threads using dma0chan0 [ 943.247694] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 88833.61 iops 5685351 KB/s (0) --------------------- System (2) before bad commit: --------------------- [ 481.309411] dmatest: Added 1 threads using dma0chan0 [ 491.197425] dmatest: Started 1 threads using dma0chan0 [ 497.047315] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 78988.94 iops 5055292 KB/s (0) [ 506.057101] dmatest: Added 1 threads using dma0chan0 [ 508.939426] dmatest: Started 1 threads using dma0chan0 [ 514.788823] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 77754.44 iops 4976284 KB/s (0) [ 531.894587] dmatest: Added 1 threads using dma0chan0 [ 534.053360] dmatest: Started 1 threads using dma0chan0 [ 539.906424] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 76988.21 iops 4927246 KB/s (0) --------------------- System (2) on HEAD=bad commit: --------------------- [44522.892995] dmatest: Added 1 threads using dma0chan0 [44526.193331] dmatest: Started 1 threads using dma0chan0 [44532.043932] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 80360.01 iops 5143040 KB/s (0) [44561.121118] dmatest: Added 1 threads using dma0chan0 [44562.868428] dmatest: Started 1 threads using dma0chan0 [44568.808577] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 16080.53 iops 1029154 KB/s (0) [44728.597409] dmatest: Added 1 threads using dma0chan0 [44730.301566] dmatest: Started 1 threads using dma0chan0 [44736.259009] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 16091.91 iops 1029882 KB/s (0) --------------------- Thanks for reading. -- Regards, Alexander Fomichev