On Tue, Sep 22, 2020 at 03:56:50PM +0200, Oscar Salvador wrote: > Aristeu Rozanski reported that a customer test case started > to report -EBUSY after the hwpoison rework patchset. > > There is a race window between spotting a free page and taking it off > its buddy freelist, so it might be that by the time we try to take it off, > the page has been already allocated. > > This patch tries to handle such race window by trying to handle the new > type of page again if the page was allocated under us. > > Signed-off-by: Oscar Salvador <osalvador@xxxxxxx> > Reported-by: Aristeu Rozanski <aris@xxxxxxxxx> > Tested-by: Aristeu Rozanski <aris@xxxxxxxxx> Acked-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > --- > mm/memory-failure.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 46b1821d2817..8f23d3c7a0a2 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1903,6 +1903,7 @@ int soft_offline_page(unsigned long pfn, int flags) > { > int ret; > struct page *page; > + bool try_again = true; > > if (!pfn_valid(pfn)) > return -ENXIO; > @@ -1918,6 +1919,7 @@ int soft_offline_page(unsigned long pfn, int flags) > return 0; > } > > +retry: > get_online_mems(); > ret = get_any_page(page, pfn, flags); > put_online_mems(); > @@ -1925,7 +1927,10 @@ int soft_offline_page(unsigned long pfn, int flags) > if (ret > 0) > ret = soft_offline_in_use_page(page); > else if (ret == 0) > - ret = soft_offline_free_page(page); > + if (soft_offline_free_page(page) && try_again) { > + try_again = false; > + goto retry; > + } > > return ret; > } > -- > 2.26.2 >