Re: BUG at mm/memory.c:1489!

Hugh Dickins <hughd@xxxxxxxxxx> · Thu, 29 May 2014 13:50:05 -0700 (PDT)

On Thu, 29 May 2014, Naoya Horiguchi wrote:
> On Wed, May 28, 2014 at 05:33:11PM -0700, Hugh Dickins wrote:
> > On Wed, 28 May 2014, Michael Ellerman wrote:
> > > Hey folks,
> > > 
> > > Anyone seen this before? Trinity hit it just now:
> > > 
> > > Linux Blade312-5 3.15.0-rc7 #306 SMP Wed May 28 17:51:18 EST 2014 ppc64
> > > 
> > > [watchdog] 27853 iterations. [F:22642 S:5174 HI:1276]
> > > ------------[ cut here ]------------
> > > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > > cpu 0xc: Vector: 700 (Program Check) at [c000000384eaf960]
> > >     pc: c0000000001ad6f0: .follow_page_mask+0x90/0x650
> > >     lr: c0000000001ad6d8: .follow_page_mask+0x78/0x650
> > >     sp: c000000384eafbe0
> > >    msr: 8000000000029032
> > >   current = 0xc0000003c27e1bc0
> > >   paca    = 0xc000000001dc3000   softe: 0        irq_happened: 0x01
> > >     pid   = 20800, comm = trinity-c12
> > > kernel BUG at /home/michael/mmk-build/flow/mm/memory.c:1489!
> > > enter ? for help
> > > [c000000384eafcc0] c0000000001e5514 .SyS_move_pages+0x524/0x7d0
> > > [c000000384eafe30] c00000000000a1d8 syscall_exit+0x0/0x98
> > > --- Exception: c01 (System Call) at 00003fff795f30a8
> > > SP (3ffff958f290) is in userspace
> > > 
> > > I've left it in the debugger, can dig into it a bit more tomorrow
> > > if anyone has any clues.
> > 
> > Thanks for leaving it overnight, but this one is quite obvious,
> > so go ahead and reboot whenever suits you.
> > 
> > Trinity didn't even need to do anything bizarre to get this: that
> > ordinary path simply didn't get tried on powerpc or ia64 before.
> > 
> > Here's a patch which should fix it for you, but I believe leaves
> > a race in common with other architectures.  I must turn away to
> > other things, and hope Naoya-san can fix up the locking separately
> > (or point out why it's already safe).
> > 
> > [PATCH] mm: fix move_pages follow_page huge_addr BUG
> > 
> > v3.12's e632a938d914 ("mm: migrate: add hugepage migration code to
> > move_pages()") is okay on most arches, but on follow_huge_addr-style
> > arches ia64 and powerpc, it hits my old BUG_ON(flags & FOLL_GET)
> > from v2.6.15 deceb6cd17e6 ("mm: follow_page with inner ptlock").
> > 
> > The point of the BUG_ON was that nothing needed FOLL_GET there at
> > the time, and it was not clear that we have sufficient locking to
> > use get_page() safely here on the outside - maybe the page found has
> > already been freed and even reused when follow_huge_addr() returns.
> > 
> > I suspect that e632a938d914's use of get_page() after return from
> > follow_huge_pmd() has the same problem: what prevents a racing
> > instance of move_pages() from already migrating away and freeing
> > that page by then?  A reference to the page should be taken while
> > holding suitable lock (huge_pte_lockptr?), to serialize against
> > concurrent migration.
> 
> Right, we need take huge_pte_lockptr() here, I think.
> 
> > But I'm not prepared to rework the hugetlb locking here myself;
> > so for now just supply a patch to copy e632a938d914's get_page()
> > after follow_huge_pmd() to after follow_huge_addr(): removing
> > the BUG_ON(flags & FOLL_GET), but probably leaving a race.
> 
> This bug was introduced by me, so I'll fix this.
> Thank you for reporting.
> 
> > Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()")
> > Reported-by: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
> > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
> > Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
> > Cc: stable@xxxxxxxxxxxxxxx # 3.12+
> 
> This patch looks good to me.
> 
> Reviewed-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>

Thanks, but seeing Michael's further problems once this is in,
I think we are better off for now with your non-x86_64 disablement.
But that's not complete in itself, I'll comment there.

Hugh

> 
> > ---
> > Whether this is a patch that should go in without fixing the locking,
> > I don't know.  An unlikely race is better than a triggerable BUG?
> > Or perhaps I'm just wrong about there being any such race.
> > 
> >  mm/memory.c |   12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > --- 3.15-rc7/mm/memory.c	2014-04-27 23:55:53.608801152 -0700
> > +++ linux/mm/memory.c	2014-05-28 13:05:48.340124615 -0700
> > @@ -1486,7 +1486,17 @@ struct page *follow_page_mask(struct vm_
> >  
> >  	page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
> >  	if (!IS_ERR(page)) {
> > -		BUG_ON(flags & FOLL_GET);
> > +		if (page && (flags & FOLL_GET)) {
> > +			/*
> > +			 * Refcount on tail pages are not well-defined and
> > +			 * shouldn't be taken. The caller should handle a NULL
> > +			 * return when trying to follow tail pages.
> > +			 */
> > +			if (PageHead(page))
> > +				get_page(page);
> > +			else
> > +				page = NULL;
> > +		}
> >  		goto out;
> >  	}
> >  
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
> > 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
> 
--
To unsubscribe from this list: send the line "unsubscribe trinity" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html