On 6/24/22 14:54, Mel Gorman wrote: > From: Nicolas Saenz Julienne <nsaenzju@xxxxxxxxxx> > > Some setups, notably NOHZ_FULL CPUs, are too busy to handle the per-cpu > drain work queued by __drain_all_pages(). So introduce a new mechanism to > remotely drain the per-cpu lists. It is made possible by remotely locking > 'struct per_cpu_pages' new per-cpu spinlocks. A benefit of this new > scheme is that drain operations are now migration safe. > > There was no observed performance degradation vs. the previous scheme. > Both netperf and hackbench were run in parallel to triggering the > __drain_all_pages(NULL, true) code path around ~100 times per second. The > new scheme performs a bit better (~5%), although the important point here > is there are no performance regressions vs. the previous mechanism. > Per-cpu lists draining happens only in slow paths. > > Minchan Kim tested an earlier version and reported; > > My workload is not NOHZ CPUs but run apps under heavy memory > pressure so they goes to direct reclaim and be stuck on > drain_all_pages until work on workqueue run. > > unit: nanosecond > max(dur) avg(dur) count(dur) > 166713013 487511.77786438033 1283 > > From traces, system encountered the drain_all_pages 1283 times and > worst case was 166ms and avg was 487us. > > The other problem was alloc_contig_range in CMA. The PCP draining > takes several hundred millisecond sometimes though there is no > memory pressure or a few of pages to be migrated out but CPU were > fully booked. > > Your patch perfectly removed those wasted time. > > Signed-off-by: Nicolas Saenz Julienne <nsaenzju@xxxxxxxxxx> > Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> Acked-by: Vlastimil Babka <vbabka@xxxxxxx>