On Fri, Jan 01, 2016 at 11:36:13AM +0200, Michael S. Tsirkin wrote: > On Mon, Dec 28, 2015 at 08:35:13AM +0900, Minchan Kim wrote: > > In balloon_page_dequeue, pages_lock should cover the loop > > (ie, list_for_each_entry_safe). Otherwise, the cursor page could > > be isolated by compaction and then list_del by isolation could > > poison the page->lru.{prev,next} so the loop finally could > > access wrong address like this. This patch fixes the bug. > > > > general protection fault: 0000 [#1] SMP > > Dumping ftrace buffer: > > (ftrace buffer empty) > > Modules linked in: > > CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1-access_bit+ #1906 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > > task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000 > > RIP: 0010:[<ffffffff8115e754>] [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130 > > RSP: 0018:ffff8800a7fefdc0 EFLAGS: 00010246 > > RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d > > RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68 > > RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000 > > R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0 > > R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060 > > FS: 0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0 > > Stack: > > 0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060 > > 0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020 > > ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060 > > Call Trace: > > [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0 > > [<ffffffff812c8bc7>] balloon+0x217/0x2a0 > > [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0 > > [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0 > > [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0 > > [<ffffffff8105b6e9>] kthread+0xc9/0xe0 > > [<ffffffff8105b620>] ? kthread_park+0x60/0x60 > > [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70 > > [<ffffffff8105b620>] ? kthread_park+0x60/0x60 > > Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89 > > RIP [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130 > > RSP <ffff8800a7fefdc0> > > ---[ end trace 43cf28060d708d5f ]--- > > Kernel panic - not syncing: Fatal exception > > Dumping ftrace buffer: > > (ftrace buffer empty) > > Kernel Offset: disabled > > > > Cc: <stable@xxxxxxxxxxxxxxx> > > Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx> > > --- > > mm/balloon_compaction.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c > > index d3116be5a00f..300117f1a08f 100644 > > --- a/mm/balloon_compaction.c > > +++ b/mm/balloon_compaction.c > > @@ -61,6 +61,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info) > > bool dequeued_page; > > > > dequeued_page = false; > > + spin_lock_irqsave(&b_dev_info->pages_lock, flags); > > list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) { > > /* > > * Block others from accessing the 'page' while we get around > > @@ -75,15 +76,14 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info) > > continue; > > } > > #endif > > - spin_lock_irqsave(&b_dev_info->pages_lock, flags); > > balloon_page_delete(page); > > __count_vm_event(BALLOON_DEFLATE); > > - spin_unlock_irqrestore(&b_dev_info->pages_lock, flags); > > unlock_page(page); > > dequeued_page = true; > > break; > > } > > } > > + spin_unlock_irqrestore(&b_dev_info->pages_lock, flags); > > > > if (!dequeued_page) { > > /* > > I think this will cause deadlocks. > > pages_lock now nests within page lock, balloon_page_putback > nests them in the reverse order. > > Did you test this with lockdep? You really should for > locking changes, and I'd expect it to warn about this. > > Also, there's another issue there I think: after isolation page could > also get freed before we try to lock it. > > We really must take a page reference before touching > the page. > > I think we need something like the below to fix this issue. > Could you please try this out, and send Tested-by? > I will repost as a proper patch if this works for you. > Nice catch! Thanks for spotting it. I just have one minor nit. See below > > diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c > index d3116be..66d69c5 100644 > --- a/mm/balloon_compaction.c > +++ b/mm/balloon_compaction.c > @@ -56,12 +56,34 @@ EXPORT_SYMBOL_GPL(balloon_page_enqueue); > */ > struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info) > { > - struct page *page, *tmp; > + struct page *page; > unsigned long flags; > bool dequeued_page; > + LIST_HEAD(processed); /* protected by b_dev_info->pages_lock */ > > dequeued_page = false; > - list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) { > + /* > + * We need to go over b_dev_info->pages and lock each page, > + * but b_dev_info->pages_lock must nest within page lock. > + * > + * To make this safe, remove each page from b_dev_info->pages list > + * under b_dev_info->pages_lock, then drop this lock. Once list is > + * empty, re-add them also under b_dev_info->pages_lock. > + */ > + spin_lock_irqsave(&b_dev_info->pages_lock, flags); > + while (!list_empty(&b_dev_info->pages)) { > + page = list_first_entry(&b_dev_info->pages, typeof(*page), lru); > + /* move to processed list to avoid going over it another time */ > + list_move(&page->lru, &processed); > + > + if (!get_page_unless_zero(page)) > + continue; > + /* > + * pages_lock nests within page lock, > + * so drop it before trylock_page > + */ > + spin_unlock_irqrestore(&b_dev_info->pages_lock, flags); > + > /* > * Block others from accessing the 'page' while we get around > * establishing additional references and preparing the 'page' > @@ -72,6 +94,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info) > if (!PagePrivate(page)) { > /* raced with isolation */ > unlock_page(page); > + put_page(page); > continue; > } > #endif > @@ -80,11 +103,18 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info) > __count_vm_event(BALLOON_DEFLATE); > spin_unlock_irqrestore(&b_dev_info->pages_lock, flags); > unlock_page(page); > + put_page(page); > dequeued_page = true; > break; ^^^^[1] > } > + put_page(page); > + spin_lock_irqsave(&b_dev_info->pages_lock, flags); > } > > + /* re-add remaining entries */ > + list_splice(&processed, &b_dev_info->pages); By breaking the loop at its ordinary and expected way-out case [1] we'll hit list_splice without holding b_dev_info->pages_lock, won't we? perhaps by adding the following on top of your patch we can address that pickle aforementioned: Cheers! Rafael -- diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c index 66d69c5..74b3e9c 100644 --- a/mm/balloon_compaction.c +++ b/mm/balloon_compaction.c @@ -58,7 +58,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info) { struct page *page; unsigned long flags; - bool dequeued_page; + bool dequeued_page, locked; LIST_HEAD(processed); /* protected by b_dev_info->pages_lock */ dequeued_page = false; @@ -105,13 +105,17 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info) unlock_page(page); put_page(page); dequeued_page = true; + locked = false; break; } put_page(page); spin_lock_irqsave(&b_dev_info->pages_lock, flags); + locked = true; } /* re-add remaining entries */ + if (!locked) + spin_lock_irqsave(&b_dev_info->pages_lock, flags); list_splice(&processed, &b_dev_info->pages); spin_unlock_irqrestore(&b_dev_info->pages_lock, flags); -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html