On Fri, Jan 21, 2022 at 06:12:17PM +0800, Hillf Danton wrote: > On Wed, 19 Jan 2022 15:55:13 +0300 Alexander Fomichev wrote: > >On Tue, Jan 18, 2022 at 10:04:48AM +0800, Hillf Danton wrote: > >> On Mon, 17 Jan 2022 20:44:19 +0300 Alexander Fomichev wrote: > >> > On Mon, Jan 17, 2022 at 10:27:01AM +0000, Mel Gorman wrote: > >> > > >> > -----< v5.15.8-vanilla >----- > >> > [17057.866760] dmatest: Added 1 threads using dma0chan0 > >> > [17060.133880] dmatest: Started 1 threads using dma0chan0 > >> > [17060.154343] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 49338.85 iops 3157686 KB/s (0) > >> > [17063.737887] dmatest: Added 1 threads using dma0chan0 > >> > [17065.113838] dmatest: Started 1 threads using dma0chan0 > >> > [17065.137659] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 42183.41 iops 2699738 KB/s (0) > >> > [17100.339989] dmatest: Added 1 threads using dma0chan0 > >> > [17102.190764] dmatest: Started 1 threads using dma0chan0 > >> > [17102.214285] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 42844.89 iops 2742073 KB/s (0) > >> > -----< end >----- > >> > > > > >Just to remind, used dmatest parameters: > > > >/sys/module/dmatest/parameters/iterations:1000 > >/sys/module/dmatest/parameters/alignment:-1 > >/sys/module/dmatest/parameters/verbose:N > >/sys/module/dmatest/parameters/norandom:Y > >/sys/module/dmatest/parameters/max_channels:0 > >/sys/module/dmatest/parameters/dmatest:0 > >/sys/module/dmatest/parameters/polled:N > >/sys/module/dmatest/parameters/threads_per_chan:1 > >/sys/module/dmatest/parameters/noverify:Y > >/sys/module/dmatest/parameters/test_buf_size:1048576 > >/sys/module/dmatest/parameters/transfer_size:65536 > >/sys/module/dmatest/parameters/run:N > >/sys/module/dmatest/parameters/wait:Y > >/sys/module/dmatest/parameters/timeout:2000 > >/sys/module/dmatest/parameters/xor_sources:3 > >/sys/module/dmatest/parameters/pq_sources:3 > > > See if tuning back down 10 degree can close the gap in iops, in the > assumption that the prev CPU can be ignored in case of cold cache. > > Also want to see the diff in output of "cat /proc/interrupts" before > and after dmatest, wondering if the dma irq is bond to a CPU core of > dancing on several ones. > > Hillf > > +++ x/kernel/sched/fair.c > @@ -5888,20 +5888,10 @@ static int wake_wide(struct task_struct > static int > wake_affine_idle(int this_cpu, int prev_cpu, int sync) > { > - /* > - * If this_cpu is idle, it implies the wakeup is from interrupt > - * context. Only allow the move if cache is shared. Otherwise an > - * interrupt intensive workload could force all tasks onto one > - * node depending on the IO topology or IRQ affinity settings. > - * > - * If the prev_cpu is idle and cache affine then avoid a migration. > - * There is no guarantee that the cache hot data from an interrupt > - * is more important than cache hot data on the prev_cpu and from > - * a cpufreq perspective, it's better to have higher utilisation > - * on one CPU. > - */ > - if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu)) > - return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu; > + /* select this cpu because of cold cache */ > + if (cpus_share_cache(this_cpu, prev_cpu)) > + if (available_idle_cpu(this_cpu)) > + return this_cpu; > > if (sync && cpu_rq(this_cpu)->nr_running == 1) > return this_cpu; > -- Hi Hillf, Thanks for the information. With the recent patch (I called it patch2) the results are following: -----< 5.15.8-Hillf-Danton-patch2+ noverify=Y >----- [ 646.568455] dmatest: Added 1 threads using dma0chan0 [ 661.127077] dmatest: Started 1 threads using dma0chan0 [ 661.147156] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 50251.25 iops 3216080 KB/s (0) [ 675.132323] dmatest: Added 1 threads using dma0chan0 [ 676.205829] dmatest: Started 1 threads using dma0chan0 [ 676.225991] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 50022.50 iops 3201440 KB/s (0) [ 703.100813] dmatest: Added 1 threads using dma0chan0 [ 704.933579] dmatest: Started 1 threads using dma0chan0 [ 704.953733] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 49950.04 iops 3196803 KB/s (0) -----< end >----- Also I have re-run the test with 'noverify=N' option, just for illustration. -----< 5.15.8-Hillf-Danton-patch2+ noverify=N >----- [ 1614.739687] dmatest: Added 1 threads using dma0chan0 [ 1620.346536] dmatest: Started 1 threads using dma0chan0 [ 1623.254880] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 23544.92 iops 1506875 KB/s (0) [ 1634.974200] dmatest: Added 1 threads using dma0chan0 [ 1635.981532] dmatest: Started 1 threads using dma0chan0 [ 1638.892182] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 23703.98 iops 1517055 KB/s (0) [ 1652.878143] dmatest: Added 1 threads using dma0chan0 [ 1655.235130] dmatest: Started 1 threads using dma0chan0 [ 1658.143206] dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 23526.64 iops 1505705 KB/s (0) -----< end >----- /proc/interrupts changes before/after the test: -----< interrupts.diff >----- - 184: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6000 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI 103813120-edge 0000:c6:00.2 + 184: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9000 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI 103813120-edge 0000:c6:00.2 -----< end >----- It looks like the MSI handler is called on the same CPU all the time. -- Regards, Alexander