Michal Hocko wrote: > On Thu 31-08-17 23:52:57, Tetsuo Handa wrote: > [...] > > So, this pending state seems to be caused by many concurrent allocations by !PF_WQ_WORKER > > threads consuming too much CPU time (because they only yield CPU time by many cond_resched() > > and one schedule_timeout_uninterruptible(1)) enough to keep schedule_timeout_uninterruptible(1) > > by PF_WQ_WORKER threads away for order of minutes. A sort of memory allocation dependency > > observable in the form of CPU time starvation for the worker to wake up. > > I do not understand this. Why is cond_resched from the user context > insufficient to let runable kworkers to run? cond_resched() from !PF_WQ_WORKER threads is sufficient for PF_WQ_WORKER threads to run. But cond_resched() is not sufficient for rescuer threads to start processing a pending work. An explicit scheduling (e.g. schedule_timeout_*()) by PF_WQ_WORKER threads is needed for rescuer threads to start processing a pending work. Since schedule_timeout_*() from PF_WQ_WORKER threads is called from very limited locations (i.e. from should_reclaim_retry(), __alloc_pages_may_oom() and out_of_memory()), it can take many seconds for PF_WQ_WORKER threads to reach such locations when many threads (both PF_WQ_WORKER and !PF_WQ_WORKER) are constantly switching each other using cond_resched() as a switching point. I think that if cond_resched() inside memory allocation path were schedule_timeout_*(), PF_WQ_WORKER threads will be able to call schedule_timeout_*() more quickly and allow rescuer threads to start processing a pending work faster than now. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>