On Thu 2022-02-24 17:28:19, Suren Baghdasaryan wrote: > Sending as an RFC to confirm if this is the right direction and to > clarify if other tasks currently executed on mm_percpu_wq should be > also moved to kthreads. The patch seems stable in testing but I want > to collect more performance data before submitting a non-RFC version. > > > Currently drain_all_pages uses mm_percpu_wq to drain pages from pcp > list during direct reclaim. The tasks on a workqueue can be delayed > by other tasks in the workqueues using the same per-cpu worker pool. > This results in sizable delays in drain_all_pages when cpus are highly > contended. > Memory management operations designed to relieve memory pressure should > not be allowed to block by other tasks, especially if the task in direct > reclaim has higher priority than the blocking tasks. > Replace the usage of mm_percpu_wq with per-cpu low priority FIFO > kthreads to execute draining tasks. > > Suggested-by: Petr Mladek <pmladek@xxxxxxxx> > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx> The patch looks good to me. See few comments below about things where I was in doubts. But I do not see any real problem with this approach. > --- > mm/page_alloc.c | 84 ++++++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 70 insertions(+), 14 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3589febc6d31..c9ab2cf4b05b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2209,6 +2210,58 @@ _deferred_grow_zone(struct zone *zone, unsigned int order) > > #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ > > +static void drain_local_pages_func(struct kthread_work *work); > + > +static int alloc_drain_worker(unsigned int cpu) > +{ > + struct pcpu_drain *drain; > + > + mutex_lock(&pcpu_drain_mutex); > + drain = per_cpu_ptr(&pcpu_drain, cpu); > + drain->worker = kthread_create_worker_on_cpu(cpu, 0, "pg_drain/%u", cpu); > + if (IS_ERR(drain->worker)) { > + drain->worker = NULL; > + pr_err("Failed to create pg_drain/%u\n", cpu); > + goto out; > + } > + /* Ensure the thread is not blocked by normal priority tasks */ > + sched_set_fifo_low(drain->worker->task); > + kthread_init_work(&drain->work, drain_local_pages_func); > +out: > + mutex_unlock(&pcpu_drain_mutex); > + > + return 0; > +} > + > +static int free_drain_worker(unsigned int cpu) > +{ > + struct pcpu_drain *drain; > + > + mutex_lock(&pcpu_drain_mutex); > + drain = per_cpu_ptr(&pcpu_drain, cpu); > + kthread_cancel_work_sync(&drain->work); I do see not how CPU down was handled in the original code. Note that workqueues call unbind_workers() when a CPU is going down. The pending work items might be proceed on another CPU. From this POV, the new code looks more safe. > + kthread_destroy_worker(drain->worker); > + drain->worker = NULL; > + mutex_unlock(&pcpu_drain_mutex); > + > + return 0; > +} > + > +static void __init init_drain_workers(void) > +{ > + unsigned int cpu; > + > + for_each_online_cpu(cpu) > + alloc_drain_worker(cpu); I though whether this need to be called under cpus_read_lock(); And I think that the code should be safe as it is. There is this call chain: + kernel_init_freeable() + page_alloc_init_late() + init_drain_workers() It is called after smp_init() but before the init process is executed. I guess that nobody could trigger CPU hotplug at this state. So there there is no need to synchronize against it. > + > + if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, > + "page_alloc/drain:online", > + alloc_drain_worker, > + free_drain_worker)) { > + pr_err("page_alloc_drain: Failed to allocate a hotplug state\n"); I am not sure if there are any special requirements about the ordering vs. other CPU hotplug operations. Just note that the per-CPU workqueues are started/stopped via CPUHP_AP_WORKQUEUE_ONLINE. They are available slightly earlier before CPUHP_AP_ONLINE_DYN when the CPU is being enabled. > + } > +} > + > void __init page_alloc_init_late(void) > { > struct zone *zone; Best Regards, Petr