On Wed, Sep 15, 2021 at 4:00 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > On Wed, Sep 15, 2021 at 10:48 AM Yang Shi <shy828301@xxxxxxxxx> wrote: > > > > On Wed, Sep 15, 2021 at 4:49 AM Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote: > > > > > > On Tue, Sep 14, 2021 at 11:37:16AM -0700, Yang Shi wrote: > > > > The khugepaged does check if the page is on LRU or not but it doesn't > > > > hold page lock. And it doesn't check this again after holding page > > > > lock. So it may race with some others, e.g. reclaimer, migration, etc. > > > > All of them isolates page from LRU then lock the page then do something. > > > > > > > > But it could pass the refcount check done by khugepaged to proceed > > > > collapse. Typically such race is not fatal. But if the page has been > > > > isolated from LRU before khugepaged it likely means the page may be not > > > > suitable for collapse for now. > > > > > > > > The other more fatal case is the following patch will keep the poisoned > > > > page in page cache for shmem, so khugepaged may collapse a poisoned page > > > > since the refcount check could pass. 3 refcounts come from: > > > > - hwpoison > > > > - page cache > > > > - khugepaged > > > > > > > > Since it is not on LRU so no refcount is incremented from LRU isolation. > > > > > > > > This is definitely not expected. Checking if it is on LRU or not after > > > > holding page lock could help serialize against hwpoison handler. > > > > > > > > But there is still a small race window between setting hwpoison flag and > > > > bump refcount in hwpoison handler. It could be closed by checking > > > > hwpoison flag in khugepaged, however this race seems unlikely to happen > > > > in real life workload. So just check LRU flag for now to avoid > > > > over-engineering. > > > > > > > > Signed-off-by: Yang Shi <shy828301@xxxxxxxxx> > > > > --- > > > > mm/khugepaged.c | 6 ++++++ > > > > 1 file changed, 6 insertions(+) > > > > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > > index 045cc579f724..bdc161dc27dc 100644 > > > > --- a/mm/khugepaged.c > > > > +++ b/mm/khugepaged.c > > > > @@ -1808,6 +1808,12 @@ static void collapse_file(struct mm_struct *mm, > > > > goto out_unlock; > > > > } > > > > > > > > + /* The hwpoisoned page is off LRU but in page cache */ > > > > + if (!PageLRU(page)) { > > > > + result = SCAN_PAGE_LRU; > > > > + goto out_unlock; > > > > + } > > > > + > > > > if (isolate_lru_page(page)) { > > > > > > isolate_lru_page() should catch the case, no? TestClearPageLRU would fail > > > and we get here. > > > > Hmm... you are definitely right. How could I miss this point. > > > > It might be because of I messed up the page state by some tests which > > may do hole punch then reread the same index. That could drop the > > poisoned page then collapse succeed. But I'm not sure. Anyway I didn't > > figure out how the poisoned page could be collapsed. It seems > > impossible. I will drop this patch. > > I think I figured out the problem. This problem happened after the > page cache split patch and if the hwpoisoned page is not head page. It > is because THP split will unfreeze the refcount of tail pages to 2 > (restore refcount from page cache) then dec refcount to 1. The > refcount pin from hwpoison is gone and it is still on LRU. Then > khugepged locked the page before hwpoison, the refcount is expected to > khugepaged. > > The worse thing is it seems this problem is applicable to anonymous > page too. Once the anonymous THP is split by hwpoison the pin from > hwpoison is gone too the refcount is 1 (comes from PTE map). Then > khugepaged could collapse it to huge page again. It may incur data > corruption. > > And the poisoned page may be freed back to buddy since the lost refcount pin. > > If the poisoned page is head page, the code is fine since hwpoison > doesn't put the refcount for head page after split. > > The fix is simple, just keep the refcount pin for hwpoisoned subpage. Err... wait... I just realized I missed the below code block: if (subpage == page) continue; It skips the subpage passed to split_huge_page() so the refcount pin from the caller for this subpage is kept. And hwpoison doesn't put it. So it seems fine. > > > > > > > > > > result = SCAN_DEL_PAGE_LRU; > > > > goto out_unlock; > > > > -- > > > > 2.26.2 > > > > > > > > > > > > > > -- > > > Kirill A. Shutemov