On Wed, Mar 19, 2014 at 12:24:44PM -0400, Sasha Levin wrote: > On 03/18/2014 10:29 PM, Naoya Horiguchi wrote: > >We have a race where we try to migrate an invalid page, resulting in > >hitting VM_BUG_ON_PAGE in isolate_huge_page(). > >queue_pages_hugetlb() is OK to fail, so let's check !PageHeadHuge to keep > >invalid hugepage from queuing. > > > >Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx> > >Signed-off-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> > >--- > > mm/mempolicy.c | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > >diff --git v3.14-rc7-mmotm-2014-03-18-16-37.orig/mm/mempolicy.c v3.14-rc7-mmotm-2014-03-18-16-37/mm/mempolicy.c > >index 9d2ef4111a4c..ae6e2d9dc855 100644 > >--- v3.14-rc7-mmotm-2014-03-18-16-37.orig/mm/mempolicy.c > >+++ v3.14-rc7-mmotm-2014-03-18-16-37/mm/mempolicy.c > >@@ -530,6 +530,17 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long addr, > > if (!pte_present(entry)) > > return 0; > > page = pte_page(entry); > >+ > >+ /* > >+ * Trinity found that page could be a non-hugepage. This is an > >+ * unexpected behavior, but it's not clear how this problem happens. > >+ * So let's simply skip such corner case. Page migration can often > >+ * fail for various reasons, so it's ok to just skip the address > >+ * unsuitable to hugepage migration. > >+ */ > >+ if (!PageHeadHuge(page)) > >+ return 0; > >+ > > nid = page_to_nid(page); > > if (node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT)) > > return 0; > > > > I have to say that I really dislike this method of solving the issue. Yes, I understand that this is not the best solution. > I think it's something fine to do for testing, but this will just hide this issue > and will let it sneak upstream. I'm really not sure if the trace I've reported is > the only codepath that would trigger it, so if we let it sneak upstream we're risking > of someone hitting it some other way. Unfortunately, I didn't have a reliable reproducer focusing on this problem (trinity hits other errors rather than this in my trials, so it gave me no crucial hint for my detailed analysis.) I think that if reproduced differently this could give us another information about how the problem happens. What I'm suggesting here is not a final-form fix, but kind of "needinfo". I must (and will) try to work on this more after LSFMM summit. Thanks, Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>