Re: [PATCH] Fix incorrect compound page flags store

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 18, 2020 at 11:35:02AM +0200, Oscar Salvador wrote:
> On Tue, Sep 08, 2020 at 09:05:56AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > > > thp just after passing over the following block:
> > >
> > >
> > >
> > > >
> > > >   1408          if (PageTransHuge(hpage)) {
> > > >   1409                  if (try_to_split_thp_page(p, "Memory Failure") < 0) {
> > > >   1410                          action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> > > >   1411                          return -EBUSY;
> > > >   1412                  }
> > > >   1413                  VM_BUG_ON_PAGE(!page_count(p), p);
> > > >   1414          }
> > > >
> > > > So I feel that some check might be added after holding page lock to avoid
> > > > that case. Or acutally, it might better that moving the above block into
> > > > page lock is more better for simpler code.
> > >
> > > I will have a look at this.
> >
> > Thank you!
>
> Hi Naoya,
>
> I have been taking a look at this, and unless I am missing something obvious I
> do not think that a new THP (containing the page) can be collapsed under us:
>
> We do take a refcount on the page by means of get_hwpoison_page.
> We could only have done that if the page was mapped, so its refcount was already
> above 0.
>
> Then we split the THP, and the refcount/mapcount go to the page we are trying to
> poison.
> At this point the page should add least have refcount > 1 mapcount >= 1.
>
> After that, let us assume that a new THP is trying to be collapsed by means of
> khugepaged thread or madvise MADV_HUGEPAGE.
>
> khugepaged_scan_pmd() scans all ptes from [pte#0..pte#511] to see if they can
> be collapsed, and one of the things it does is checking the page's refcount/
> mappcount by calling is_refcount_suitable().
>
> 	expected_refcount = total_mapcount(page);
> 	return page_count(page) == expected_refcount;
>
> We do have an extra pin from memory_failure, so this is going to fail because
>
> page: refcount = 2 , mapcount = 1
>
> Beware that the page must sitll be mapped somehow, otherwise the PageLRU check
> from above should have failed with the same result:
>
> 	if (!PageLRU(page)) {
> 		result = SCAN_PAGE_LRU;
> 		goto out_unmap;
> 	}
>
> So, I do not think the page can be collapsed into a new THP after we have split
> it here, but as I said, I might be missing something.

This logic sounds convincing to me, or another possibility like conversion to
other types of compound_page (like slab) is also prevented due to the refcount.

The MF_MSG_DIFFERENT_COMPOUND path was originally introduced heuristically
based on error report in stress testing, and the mechanism of the problem
was unclear.

Thanks,
Naoya Horiguchi




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux