On Mon 28-08-17 20:53:30, Tetsuo Handa wrote: > I noticed that drain_local_pages_wq work stuck for minutes despite it is > on WQ_MEM_RECLAIM mm_percpu_wq workqueue. Tejun Heo pointed out [1]: > > Rescuer helps if the worker pool that the workqueue is associated with > hangs. If we have other work items actively running, e.g., for reclaim > on the pool, the pool isn't stalled and rescuers won't be woken up. If > the work items need preferential execution, it should use WQ_HIGHPRI. > > Since work items on mm_percpu_wq workqueue are expected to be executed > as soon as possible, let's use WQ_HIGHPRI. Note that even with WQ_HIGHPRI, > up to a few seconds of delay seems to be unavoidable. I am not sure I understand how WQ_HIGHPRI actually helps. The work item will get served by a thread with higher priority and from a different pool than regular WQs. But what prevents the same issue as described above when the highprio pool gets congested? In other words what make WQ_HIGHPRI less prone to long stalls when we are under low memory situation and new workers cannot be allocated? > If we do want to make > sure that work items on mm_percpu_wq workqueue are executed without delays, > we need to consider using kthread_workers instead of workqueue. (Or, maybe > somehow we can share one kthread with constantly manipulating cpumask?) Hmm, that doesn't sound like a bad idea to me. We already have a rescuer thread that basically sits idle all the time so having a dedicated kernel thread will not be more expensive wrt. resources. So I think this is a more reasonable approach than playing with WQ_HIGHPRI which smells like a quite obscure workaround than a real fix to me. > [1] http://lkml.kernel.org/r/201707111951.IHA98084.OHQtVOFJMLOSFF@xxxxxxxxxxxxxxxxxxx > > Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxx> > Cc: Vlastimil Babka <vbabka@xxxxxxx> > --- > mm/vmstat.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 4bb13e7..cb7e198 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -1923,7 +1923,8 @@ void __init init_mm_internals(void) > { > int ret __maybe_unused; > > - mm_percpu_wq = alloc_workqueue("mm_percpu_wq", WQ_MEM_RECLAIM, 0); > + mm_percpu_wq = alloc_workqueue("mm_percpu_wq", > + WQ_MEM_RECLAIM | WQ_HIGHPRI, 0); > > #ifdef CONFIG_SMP > ret = cpuhp_setup_state_nocalls(CPUHP_MM_VMSTAT_DEAD, "mm/vmstat:dead", > -- > 1.8.3.1 > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>