Re: mm: deadlock between get_online_cpus/pcpu_alloc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 06, 2017 at 08:13:35PM +0100, Dmitry Vyukov wrote:
> On Mon, Jan 30, 2017 at 4:48 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> > On Sun, Jan 29, 2017 at 6:22 PM, Vlastimil Babka <vbabka@xxxxxxx> wrote:
> >> On 29.1.2017 13:44, Dmitry Vyukov wrote:
> >>> Hello,
> >>>
> >>> I've got the following deadlock report while running syzkaller fuzzer
> >>> on f37208bc3c9c2f811460ef264909dfbc7f605a60:
> >>>
> >>> [ INFO: possible circular locking dependency detected ]
> >>> 4.10.0-rc5-next-20170125 #1 Not tainted
> >>> -------------------------------------------------------
> >>> syz-executor3/14255 is trying to acquire lock:
> >>>  (cpu_hotplug.dep_map){++++++}, at: [<ffffffff814271c7>]
> >>> get_online_cpus+0x37/0x90 kernel/cpu.c:239
> >>>
> >>> but task is already holding lock:
> >>>  (pcpu_alloc_mutex){+.+.+.}, at: [<ffffffff81937fee>]
> >>> pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897
> >>>
> >>> which lock already depends on the new lock.
> >>
> >> I suspect the dependency comes from recent changes in drain_all_pages(). They
> >> were later redone (for other reasons, but nice to have another validation) in
> >> the mmots patch [1], which AFAICS is not yet in mmotm and thus linux-next. Could
> >> you try if it helps?
> >
> > It happened only once on linux-next, so I can't verify the fix. But I
> > will watch out for other occurrences.
> 
> Unfortunately it does not seem to help.

I'm a little stuck on how to best handle this. get_online_cpus() can
halt forever if the hotplug operation is holding the mutex when calling
pcpu_alloc. One option would be to add a try_get_online_cpus() helper which
trylocks the mutex. However, given that drain is so unlikely to actually
make that make a difference when racing against parallel allocations,
I think this should be acceptable.

Any objections?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3b93879990fd..a3192447e906 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3432,7 +3432,17 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 	 */
 	if (!page && !drained) {
 		unreserve_highatomic_pageblock(ac, false);
-		drain_all_pages(NULL);
+
+		/*
+		 * Only drain from contexts allocating for user allocations.
+		 * Kernel allocations could be holding a CPU hotplug-related
+		 * mutex, particularly hot-add allocating per-cpu structures
+		 * while hotplug-related mutex's are held which would prevent
+		 * get_online_cpus ever returning.
+		 */
+		if (gfp_mask & __GFP_HARDWALL)
+			drain_all_pages(NULL);
+
 		drained = true;
 		goto retry;
 	}

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux