Re: [PATCH v3 1/2] mm: khugepaged: recover from poisoned anonymous memory

Jue Wang <juew@xxxxxxxxxx> · Tue, 24 May 2022 07:05:37 -0700

On Tue, May 24, 2022 at 3:48 AM Oscar Salvador <osalvador@xxxxxxx> wrote:
>
> On Mon, May 23, 2022 at 07:53:51PM -0700, Jiaqi Yan wrote:
> > Make __collapse_huge_page_copy return whether
> > collapsing/copying anonymous pages succeeded,
> > and make collapse_huge_page handle the return status.
> >
> > Break existing PTE scan loop into two for-loops.
> > The first loop copies source pages into target huge page,
> > and can fail gracefully when running into memory errors in
> > source pages. Roll back the page table and page states
> > when copying failed:
> > 1) re-establish the PTEs-to-PMD connection.
> > 2) release pages back to their LRU list.
>
> If you spell out what the first loop does, just tell
> what the second loop does as wel, it just gets easier.
>
> > +static bool __collapse_huge_page_copy(pte_t *pte,
> > +                             struct page *page,
> > +                             pmd_t *pmd,
> > +                             pmd_t rollback,
> > +                             struct vm_area_struct *vma,
> > +                             unsigned long address,
> > +                             spinlock_t *pte_ptl,
> > +                             struct list_head *compound_pagelist)
> >  {
> >       struct page *src_page, *tmp;
> >       pte_t *_pte;
> > -     for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
> > -                             _pte++, page++, address += PAGE_SIZE) {
> > -             pte_t pteval = *_pte;
> > +     pte_t pteval;
> > +     unsigned long _address;
> > +     spinlock_t *pmd_ptl;
> > +     bool copy_succeeded = true;
> >
> > -             if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> > +     /*
> > +      * Copying pages' contents is subject to memory poison at any iteration.
> > +      */
> > +     for (_pte = pte, _address = address;
> > +                     _pte < pte + HPAGE_PMD_NR;
> > +                     _pte++, page++, _address += PAGE_SIZE) {
> > +             pteval = *_pte;
> > +
> > +             if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval)))
> >                       clear_user_highpage(page, address);
> > -                     add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1);
> > -                     if (is_zero_pfn(pte_pfn(pteval))) {
> > -                             /*
> > -                              * ptl mostly unnecessary.
> > -                              */
> > -                             spin_lock(ptl);
> > -                             ptep_clear(vma->vm_mm, address, _pte);
> > -                             spin_unlock(ptl);
> > +             else {
> > +                     src_page = pte_page(pteval);
> > +                     if (copy_highpage_mc(page, src_page)) {
> > +                             copy_succeeded = false;
> > +                             trace_mm_collapse_huge_page_copy(pte_page(*pte),
> > +                                             src_page, SCAN_COPY_MC);
>
> You seem to assume that if there is an error, it will always happen on
> the page we are copying from. What if the page we are copying to is the
> fauly one? Can that happen? Can that be detected by copy_mc_to_kernel?
> And if so, can that be differentiated?
It's possible that the copy to page has some uncorrectable memory errors.

Yet only read transactions signals machine check exceptions while
write transactions do not.

It's the reading from the source page with uncorrectable errors
causing system panics most (since khugepaged is a kernel context
thread).

Thanks,
-Jue
>
> --
> Oscar Salvador
> SUSE Labs