Re: [PATCH] mm: Use WQ_HIGHPRI for mm_percpu_wq.

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Fri, 1 Sep 2017 07:07:25 +0900

Michal Hocko wrote:
> On Thu 31-08-17 23:52:57, Tetsuo Handa wrote:
> [...]
> > So, this pending state seems to be caused by many concurrent allocations by !PF_WQ_WORKER
> > threads consuming too much CPU time (because they only yield CPU time by many cond_resched()
> > and one schedule_timeout_uninterruptible(1)) enough to keep schedule_timeout_uninterruptible(1)
> > by PF_WQ_WORKER threads away for order of minutes. A sort of memory allocation dependency
> > observable in the form of CPU time starvation for the worker to wake up.
> 
> I do not understand this. Why is cond_resched from the user context
> insufficient to let runable kworkers to run?

cond_resched() from !PF_WQ_WORKER threads is sufficient for PF_WQ_WORKER threads to run.
But cond_resched() is not sufficient for rescuer threads to start processing a pending work.
An explicit scheduling (e.g. schedule_timeout_*()) by PF_WQ_WORKER threads is needed for
rescuer threads to start processing a pending work.

Since schedule_timeout_*() from PF_WQ_WORKER threads is called from very limited locations
(i.e. from should_reclaim_retry(), __alloc_pages_may_oom() and out_of_memory()), it can
take many seconds for PF_WQ_WORKER threads to reach such locations when many threads (both
PF_WQ_WORKER and !PF_WQ_WORKER) are constantly switching each other using cond_resched()
as a switching point. I think that if cond_resched() inside memory allocation path were
schedule_timeout_*(), PF_WQ_WORKER threads will be able to call schedule_timeout_*() more
quickly and allow rescuer threads to start processing a pending work faster than now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>