[RFC] Scheduler: DMA Engine regression because of sched/fair changes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



CC: Mel Gorman <mgorman@xxxxxxx>
CC: linux@xxxxxxxxx

Hi all,

There's a huge regression found, which affects Intel Xeon's DMA Engine
performance between v4.14 LTS and modern kernels. In certain
circumstances the speed in dmatest is more than 6 times lower.

	- Hardware -
I did testing on 2 systems:
1) Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz (Supermicro X11DAi-N)
2) Intel(R) Xeon(R) Bronze 3204 CPU @ 1.90GHz (YADRO Vegman S220)

	- Measurement -
The dmatest result speed decreases with almost any test settings.
Although the most significant impact is revealed with 64K transfers. The
following parameters were used:

modprobe dmatest iterations=1000 timeout=2000 test_buf_size=0x100000 transfer_size=0x10000 norandom=1
echo "dma0chan0" > /sys/module/dmatest/parameters/channel
echo 1 > /sys/module/dmatest/parameters/run

Every test csse was performed at least 3 times. All detailed results are
below.

	- Analysis -
Bisecting revealed 2 different bad commits for those 2 systems, but both
change the same function/condition in the same file.
For the system (1) the bad commit is:
[7332dec055f2457c386032f7e9b2991eb05c2a0a] sched/fair: Only immediately migrate tasks due to interrupts if prev and target CPUs share cache
For the system (2) the bad commit is:
[806486c377e33ab662de6d47902e9e2a32b79368] sched/fair: Do not migrate if the prev_cpu is idle

	- Additional check -
Attempting to revert the changes above, a dirty patch for the (current)
kernel v5.16.0-rc5 was tested too:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6f16dfb74246..0a58cc00b1b8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5931,8 +5931,8 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
         * a cpufreq perspective, it's better to have higher utilisation
         * on one CPU.
         */
-       if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
-               return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
+       if (available_idle_cpu(this_cpu))
+               return this_cpu;

        if (sync && cpu_rq(this_cpu)->nr_running == 1)
                return this_cpu;

Please, take a look if this makes sense. But with this patch applied the
performance of DMA Engine restores.

	- Dmatest results TL;DR -

System (1) before bad commit:
---------------------
[  519.894642] dmatest: Added 1 threads using dma0chan0
[  525.383021] dmatest: Started 1 threads using dma0chan0
[  528.521915] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 98367.10 iops 6295494 KB/s (0)
[  544.851751] dmatest: Added 1 threads using dma0chan0
[  546.460064] dmatest: Started 1 threads using dma0chan0
[  549.609504] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 100310.96 iops 6419901 KB/s (0)
[  562.178365] dmatest: Added 1 threads using dma0chan0
[  563.852534] dmatest: Started 1 threads using dma0chan0
[  567.004898] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 98580.44 iops 6309148 KB/s (0)
---------------------

System (1) on HEAD=bad commit:
---------------------
[  149.555401] dmatest: Added 1 threads using dma0chan0
[  154.162444] dmatest: Started 1 threads using dma0chan0
[  157.490868] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 26653.87 iops 1705847 KB/s (0)
[  176.783450] dmatest: Added 1 threads using dma0chan0
[  178.428518] dmatest: Started 1 threads using dma0chan0
[  181.606531] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 14194.86 iops 908471 KB/s (0)
[  192.125218] dmatest: Added 1 threads using dma0chan0
[  194.060029] dmatest: Started 1 threads using dma0chan0
[  197.235265] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 14757.09 iops 944454 KB/s (0)
---------------------

Systen (1) on v5.16.0-rc5:
---------------------
[ 1430.860170] dmatest: Added 1 threads using dma0chan0
[ 1437.367447] dmatest: Started 1 threads using dma0chan0
[ 1442.756660] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 24837.31 iops 1589588 KB/s (0)
[ 1561.614191] dmatest: Added 1 threads using dma0chan0
[ 1562.816375] dmatest: Started 1 threads using dma0chan0
[ 1566.619614] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 13666.05 iops 874627 KB/s (0)
[ 1585.019601] dmatest: Added 1 threads using dma0chan0
[ 1587.585741] dmatest: Started 1 threads using dma0chan0
[ 1591.386816] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 13521.91 iops 865402 KB/s (0)
---------------------

System (1) on v5.16.0-rc5 with dirty patch:
---------------------
[  733.571508] dmatest: Added 1 threads using dma0chan0
[  746.050800] dmatest: Started 1 threads using dma0chan0
[  749.765600] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 87260.03 iops 5584642 KB/s (0)
[  915.051955] dmatest: Added 1 threads using dma0chan0
[  916.550732] dmatest: Started 1 threads using dma0chan0
[  920.267525] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 88464.25 iops 5661712 KB/s (0)
[  936.781273] dmatest: Added 1 threads using dma0chan0
[  939.528616] dmatest: Started 1 threads using dma0chan0
[  943.247694] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 88833.61 iops 5685351 KB/s (0)
---------------------

System (2) before bad commit:
---------------------
[  481.309411] dmatest: Added 1 threads using dma0chan0
[  491.197425] dmatest: Started 1 threads using dma0chan0
[  497.047315] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 78988.94 iops 5055292 KB/s (0)
[  506.057101] dmatest: Added 1 threads using dma0chan0
[  508.939426] dmatest: Started 1 threads using dma0chan0
[  514.788823] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 77754.44 iops 4976284 KB/s (0)
[  531.894587] dmatest: Added 1 threads using dma0chan0
[  534.053360] dmatest: Started 1 threads using dma0chan0
[  539.906424] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 76988.21 iops 4927246 KB/s (0)
---------------------

System (2) on HEAD=bad commit:
---------------------
[44522.892995] dmatest: Added 1 threads using dma0chan0
[44526.193331] dmatest: Started 1 threads using dma0chan0
[44532.043932] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 80360.01 iops 5143040 KB/s (0)
[44561.121118] dmatest: Added 1 threads using dma0chan0
[44562.868428] dmatest: Started 1 threads using dma0chan0
[44568.808577] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 16080.53 iops 1029154 KB/s (0)
[44728.597409] dmatest: Added 1 threads using dma0chan0
[44730.301566] dmatest: Started 1 threads using dma0chan0
[44736.259009] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 16091.91 iops 1029882 KB/s (0)
---------------------

Thanks for reading.

-- 
Regards,
  Alexander Fomichev



[Index of Archives]     [Linux Kernel]     [Linux ARM (vger)]     [Linux ARM MSM]     [Linux Omap]     [Linux Arm]     [Linux Tegra]     [Fedora ARM]     [Linux for Samsung SOC]     [eCos]     [Linux PCI]     [Linux Fastboot]     [Gcc Help]     [Git]     [DCCP]     [IETF Announce]     [Security]     [Linux MIPS]     [Yosemite Campsites]

  Powered by Linux