On 02/07/2017 09:48 AM, Michal Hocko wrote: > On Mon 06-02-17 22:05:30, Mel Gorman wrote: >>> Unfortunately it does not seem to help. >> >> I'm a little stuck on how to best handle this. get_online_cpus() can >> halt forever if the hotplug operation is holding the mutex when calling >> pcpu_alloc. One option would be to add a try_get_online_cpus() helper which >> trylocks the mutex. However, given that drain is so unlikely to actually >> make that make a difference when racing against parallel allocations, >> I think this should be acceptable. >> >> Any objections? >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 3b93879990fd..a3192447e906 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -3432,7 +3432,17 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order, >> */ >> if (!page && !drained) { >> unreserve_highatomic_pageblock(ac, false); >> - drain_all_pages(NULL); >> + >> + /* >> + * Only drain from contexts allocating for user allocations. >> + * Kernel allocations could be holding a CPU hotplug-related >> + * mutex, particularly hot-add allocating per-cpu structures >> + * while hotplug-related mutex's are held which would prevent >> + * get_online_cpus ever returning. >> + */ >> + if (gfp_mask & __GFP_HARDWALL) >> + drain_all_pages(NULL); >> + > > This wouldn't work AFAICS. If you look at the lockdep splat, the path > which reverses the locking order (takes pcpu_alloc_mutex prior to > cpu_hotplug.lock is bpf_array_alloc_percpu which is GFP_USER and thus > __GFP_HARDWALL. > > I believe we shouldn't pull any dependency on the hotplug locks inside > the allocator. This is just too fragile! Can we simply drop the > get_online_cpus()? Why do we need it, anyway? Say we are racing with the It was added after I noticed in review that queue_work_on() has a comment that caller must ensure that cpu can't go away, and wondered about it. Also noted that a similar lru_add_drain_all() does it too. > cpu offlining. I have to check the code but my impression was that WQ > code will ignore the cpu requested by the work item when the cpu is > going offline. If the offline happens while the worker function already > executes then it has to wait as we run with preemption disabled so we > should be safe here. Or am I missing something obvious? Tejun suggested an alternative solution to avoiding get_online_cpus() in this thread: https://lkml.kernel.org/r/<20170123170329.GA7820@xxxxxxxxxxxxxxx> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>