Hi Mike, On Wed, May 29, 2019 at 11:44:50AM -0700, Mike Kravetz wrote: > On 5/26/19 11:06 PM, Naoya Horiguchi wrote: > > Soft offline events for hugetlb pages return -EBUSY when page migration > > succeeded and dissolve_free_huge_page() failed, which can happen when > > there're surplus hugepages. We should judge pass/fail of soft offline by > > checking whether the raw error page was finally contained or not (i.e. > > the result of set_hwpoison_free_buddy_page()), so this behavior is wrong. > > > > This problem was introduced by the following change of commit 6bc9b56433b76 > > ("mm: fix race on soft-offlining"): > > > > if (ret > 0) > > ret = -EIO; > > } else { > > - if (PageHuge(page)) > > - dissolve_free_huge_page(page); > > + /* > > + * We set PG_hwpoison only when the migration source hugepage > > + * was successfully dissolved, because otherwise hwpoisoned > > + * hugepage remains on free hugepage list, then userspace will > > + * find it as SIGBUS by allocation failure. That's not expected > > + * in soft-offlining. > > + */ > > + ret = dissolve_free_huge_page(page); > > + if (!ret) { > > + if (set_hwpoison_free_buddy_page(page)) > > + num_poisoned_pages_inc(); > > + } > > } > > return ret; > > } > > > > , so a simple fix is to restore the PageHuge precheck, but my code > > reading shows that we already have PageHuge check in > > dissolve_free_huge_page() with hugetlb_lock, which is better place to > > check it. And currently dissolve_free_huge_page() returns -EBUSY for > > !PageHuge but that's simply wrong because that that case should be > > considered as success (meaning that "the given hugetlb was already > > dissolved.") > > Hello Naoya, > > I am having a little trouble understanding the situation. The code above is > in the routine soft_offline_huge_page, and occurs immediately after a call to > migrate_pages() with 'page' being the only on the list of pages to be migrated. > In addition, since we are in soft_offline_huge_page, we know that page is > a huge page (PageHuge) before the call to migrate_pages. > > IIUC, the issue is that the migrate_pages call results in 'page' being > dissolved into regular base pages. Therefore, the call to > dissolve_free_huge_page returns -EBUSY and we never end up setting PageHWPoison > on the (base) page which had the error. > > It seems that for the original page to be dissolved, it must go through the > free_huge_page routine. Once that happens, it is possible for the (dissolved) > pages to be allocated again. Is that just a known race, or am I missing > something? No, your understanding is right. I found that the last (and most important) part of patch description ("this behavior is wrong") might be wrong. Sorry about that and let me correct myself: - before commit 6bc9b56433b76, the return value of soft offline is the return of migrate_page(). dissolve_free_huge_page()'s return value is ignored. - after commit 6bc9b56433b76 soft_offline_huge_page() returns success only dissolve_free_huge_page() returns success. This change is *mainly OK* (meaning nothing is broken), but there still remains the room of improvement, that is, even in "dissolved from free_huge_page()" case, we can try to call set_hwpoison_free_buddy_page() to contain the 4kB error page, but we don't try it now because dissolve_free_huge_page() return -EBUSY for !PageHuge case. > > > This change affects other callers of dissolve_free_huge_page(), > > which are also cleaned up by this patch. > > It may just be me, but I am having a hard time separating the fix for this > issue from the change to the dissolve_free_huge_page routine. Would it be > more clear or possible to create separate patches for these? Yes, the change is actually an 'improvement' purely related to hugetlb, and seems not a 'bug fix'. So I'll update the description. Maybe no need to separate to patches. Thanks, Naoya Horiguchi