On Mon, Jun 29, 2020 at 12:29:25PM +0200, Oscar Salvador wrote: > On Wed, 2020-06-24 at 15:01 +0000, nao.horiguchi@xxxxxxxxx wrote: > > I rebased soft-offline rework patchset [1][2] onto the latest > > mmotm. The > > rebasing required some non-trivial changes to adjust, but mainly that > > was > > straightforward. I confirmed that the reported problem doesn't > > reproduce on > > compaction after soft offline. For more precise description of the > > problem > > and the motivation of this patchset, please see [2]. > > Hi Naoya, > > Thanks for dusting this off. > To be honest, I got stuck with the hard offline mode so this delayed > the resubmission, along other problems. > > > I think that the following two patches in v2 are better to be done > > with > > separate work of hard-offline rework, so it's not included in this > > series. > > > > - mm,hwpoison: Take pages off the buddy when hard-offlining > > - mm/hwpoison-inject: Rip off duplicated checks > > > > These two are not directly related to the reported problem, so they > > seems > > not urgent. And the first one breaks num_poisoned_pages counting in > > some > > testcases, and The second patch needs more consideration about > > commented point. > > I fully agree. > > > Any comment/suggestion/help would be appreciated. > > My "new" version included a patch to make sure we give a chance to > pages that possibly are in a pcplist. > Current behavior is that if someone tries to soft-offline such a page, > we return an error because page count is 0 but page is not in the buddy > system. > > Since this patchset already landed in the mm tree, I could send it as a > standalone patch on top if you agree with it. > > My patch looked something like: > > From: Oscar Salvador <osalvador@xxxxxxx> > Date: Mon, 29 Jun 2020 12:25:11 +0200 > Subject: [PATCH] mm,hwpoison: Drain pcplists before bailing out for > non-buddy > zero-refcount page > > A page with 0-refcount and !PageBuddy could perfectly be a pcppage. > Currently, we bail out with an error if we encounter such a page, > meaning that we do not give a chance to handle pcppages. > > Fix this by draining pcplists whenever we find this kind of page > and retry the check again. > It might be that pcplists have been spilled into the buddy allocator > and so we can handle it. > > Signed-off-by: Oscar Salvador <osalvador@xxxxxxx> I'm fine with this patch, so this will be with the next version with some minor change like moving function comment block together. Actually I feel we could refactor get_hwpoison_page() with get_any_page() to handle this kind of pcplist related issues in one place. But this could be more important in hard-offline reworking, and we don't have to do it in this series. Thanks, Naoya Horiguchi