On Fri, Jul 13, 2018 at 01:40:02PM -0700, Andrew Morton wrote: > On Fri, 13 Jul 2018 12:26:06 +0900 Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> wrote: > > > A process can be killed with SIGBUS(BUS_MCEERR_AR) when it tries to > > allocate a page that was just freed on the way of soft-offline. > > This is undesirable because soft-offline (which is about corrected error) > > is less aggressive than hard-offline (which is about uncorrected error), > > and we can make soft-offline fail and keep using the page for good reason > > like "system is busy." > > > > Two main changes of this patch are: > > > > - setting migrate type of the target page to MIGRATE_ISOLATE. As done > > in free_unref_page_commit(), this makes kernel bypass pcplist when > > freeing the page. So we can assume that the page is in freelist just > > after put_page() returns, > > > > - setting PG_hwpoison on free page under zone->lock which protects > > freelists, so this allows us to avoid setting PG_hwpoison on a page > > that is decided to be allocated soon. > > > > > > ... > > > > + > > +#ifdef CONFIG_MEMORY_FAILURE > > +/* > > + * Set PG_hwpoison flag if a given page is confirmed to be a free page > > + * within zone lock, which prevents the race against page allocation. > > + */ > > I think this is clearer? > > --- a/mm/page_alloc.c~mm-soft-offline-close-the-race-against-page-allocation-fix > +++ a/mm/page_alloc.c > @@ -8039,8 +8039,9 @@ bool is_free_buddy_page(struct page *pag > > #ifdef CONFIG_MEMORY_FAILURE > /* > - * Set PG_hwpoison flag if a given page is confirmed to be a free page > - * within zone lock, which prevents the race against page allocation. > + * Set PG_hwpoison flag if a given page is confirmed to be a free page. This > + * test is performed under the zone lock to prevent a race against page > + * allocation. Yes, I like it. Thanks, Naoya Horiguchi