On Tue, Sep 08, 2020 at 09:05:56AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote: > > > thp just after passing over the following block: > > > > > > > > > > > > 1408 if (PageTransHuge(hpage)) { > > > 1409 if (try_to_split_thp_page(p, "Memory Failure") < 0) { > > > 1410 action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED); > > > 1411 return -EBUSY; > > > 1412 } > > > 1413 VM_BUG_ON_PAGE(!page_count(p), p); > > > 1414 } > > > > > > So I feel that some check might be added after holding page lock to avoid > > > that case. Or acutally, it might better that moving the above block into > > > page lock is more better for simpler code. > > > > I will have a look at this. > > Thank you! Hi Naoya, I have been taking a look at this, and unless I am missing something obvious I do not think that a new THP (containing the page) can be collapsed under us: We do take a refcount on the page by means of get_hwpoison_page. We could only have done that if the page was mapped, so its refcount was already above 0. Then we split the THP, and the refcount/mapcount go to the page we are trying to poison. At this point the page should add least have refcount > 1 mapcount >= 1. After that, let us assume that a new THP is trying to be collapsed by means of khugepaged thread or madvise MADV_HUGEPAGE. khugepaged_scan_pmd() scans all ptes from [pte#0..pte#511] to see if they can be collapsed, and one of the things it does is checking the page's refcount/ mappcount by calling is_refcount_suitable(). expected_refcount = total_mapcount(page); return page_count(page) == expected_refcount; We do have an extra pin from memory_failure, so this is going to fail because page: refcount = 2 , mapcount = 1 Beware that the page must sitll be mapped somehow, otherwise the PageLRU check from above should have failed with the same result: if (!PageLRU(page)) { result = SCAN_PAGE_LRU; goto out_unmap; } So, I do not think the page can be collapsed into a new THP after we have split it here, but as I said, I might be missing something. Thoughts? -- Oscar Salvador SUSE L3