On Thu, 12 May 2022 09:50:43 +0100 Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > From: Nicolas Saenz Julienne <nsaenzju@xxxxxxxxxx> > > Some setups, notably NOHZ_FULL CPUs, are too busy to handle the per-cpu > drain work queued by __drain_all_pages(). So introduce a new mechanism to > remotely drain the per-cpu lists. It is made possible by remotely locking > 'struct per_cpu_pages' new per-cpu spinlocks. A benefit of this new scheme > is that drain operations are now migration safe. > > There was no observed performance degradation vs. the previous scheme. > Both netperf and hackbench were run in parallel to triggering the > __drain_all_pages(NULL, true) code path around ~100 times per second. > The new scheme performs a bit better (~5%), although the important point > here is there are no performance regressions vs. the previous mechanism. > Per-cpu lists draining happens only in slow paths. > > Minchan Kim tested this independently and reported; > > My workload is not NOHZ CPUs but run apps under heavy memory > pressure so they goes to direct reclaim and be stuck on > drain_all_pages until work on workqueue run. > > unit: nanosecond > max(dur) avg(dur) count(dur) > 166713013 487511.77786438033 1283 > > From traces, system encountered the drain_all_pages 1283 times and > worst case was 166ms and avg was 487us. > > The other problem was alloc_contig_range in CMA. The PCP draining > takes several hundred millisecond sometimes though there is no > memory pressure or a few of pages to be migrated out but CPU were > fully booked. > > Your patch perfectly removed those wasted time. I'm not getting a sense here of the overall effect upon userspace performance. As Thomas said last year in https://lkml.kernel.org/r/87v92sgt3n.ffs@tglx : The changelogs and the cover letter have a distinct void vs. that which : means this is just another example of 'scratch my itch' changes w/o : proper justification. Is there more to all of this than itchiness and if so, well, you know the rest ;)