Re: mm: deadlock between get_online_cpus/pcpu_alloc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 07, 2017 at 10:23:31AM +0100, Vlastimil Babka wrote:
> > cpu offlining. I have to check the code but my impression was that WQ
> > code will ignore the cpu requested by the work item when the cpu is
> > going offline. If the offline happens while the worker function already
> > executes then it has to wait as we run with preemption disabled so we
> > should be safe here. Or am I missing something obvious?
> 
> Tejun suggested an alternative solution to avoiding get_online_cpus() in
> this thread:
> https://lkml.kernel.org/r/<20170123170329.GA7820@xxxxxxxxxxxxxxx>

But it would look like the following as it could be serialised against
pcpu_drain_mutex as the cpu hotplug teardown callback is allowed to sleep.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3b93879990fd..8cd8b1bbe00c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2319,9 +2319,17 @@ static void drain_pages(unsigned int cpu)
 {
 	struct zone *zone;
 
+	/*
+	 * A per-cpu drain via a workqueue from drain_all_pages can be
+	 * rescheduled onto an unrelated CPU. That allows the hotplug
+	 * operation and the drain to potentially race on the same
+	 * CPU. Serialise hotplug versus drain using pcpu_drain_mutex
+	 */
+	mutex_lock(&pcpu_drain_mutex);
 	for_each_populated_zone(zone) {
 		drain_pages_zone(cpu, zone);
 	}
+	mutex_unlock(&pcpu_drain_mutex);
 }
 
 /*
@@ -2377,13 +2385,10 @@ void drain_all_pages(struct zone *zone)
 		mutex_lock(&pcpu_drain_mutex);
 	}
 
-	get_online_cpus();
-
 	/*
-	 * We don't care about racing with CPU hotplug event
-	 * as offline notification will cause the notified
-	 * cpu to drain that CPU pcps and on_each_cpu_mask
-	 * disables preemption as part of its processing
+	 * We don't care about racing with CPU hotplug event as offline
+	 * notification will cause the notified cpu to drain that CPU pcps
+	 * and it is serialised against here via pcpu_drain_mutex.
 	 */
 	for_each_online_cpu(cpu) {
 		struct per_cpu_pageset *pcp;
@@ -2418,7 +2423,6 @@ void drain_all_pages(struct zone *zone)
 	for_each_cpu(cpu, &cpus_with_pcps)
 		flush_work(per_cpu_ptr(&pcpu_drain, cpu));
 
-	put_online_cpus();
 	mutex_unlock(&pcpu_drain_mutex);
 }
 

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux