Re: [PATCH v3 1/2] mm: khugepaged: recover from poisoned anonymous memory

Jiaqi Yan <jiaqiyan@xxxxxxxxxx> · Tue, 24 May 2022 09:32:13 -0700

On Tue, May 24, 2022 at 7:05 AM Jue Wang <juew@xxxxxxxxxx> wrote:
>
> On Tue, May 24, 2022 at 3:48 AM Oscar Salvador <osalvador@xxxxxxx> wrote:
> >
> > On Mon, May 23, 2022 at 07:53:51PM -0700, Jiaqi Yan wrote:
> > > Make __collapse_huge_page_copy return whether
> > > collapsing/copying anonymous pages succeeded,
> > > and make collapse_huge_page handle the return status.
> > >
> > > Break existing PTE scan loop into two for-loops.
> > > The first loop copies source pages into target huge page,
> > > and can fail gracefully when running into memory errors in
> > > source pages. Roll back the page table and page states
> > > when copying failed:
> > > 1) re-establish the PTEs-to-PMD connection.
> > > 2) release pages back to their LRU list.
> >
> > If you spell out what the first loop does, just tell
> > what the second loop does as wel, it just gets easier.
> >

Thanks for the suggestion.
I will amend the commit msg in the next version as follows:

If copying all pages succeeds, the second loop releases and clears up
these normal pages. Otherwise, the second loop does the followings to
roll back the page table and page states:
1) re-establish the original PTEs-to-PMD connection.
2) release source pages back to their LRU list.

>
> > > +static bool __collapse_huge_page_copy(pte_t *pte,
> > > +                             struct page *page,
> > > +                             pmd_t *pmd,
> > > +                             pmd_t rollback,
> > > +                             struct vm_area_struct *vma,
> > > +                             unsigned long address,
> > > +                             spinlock_t *pte_ptl,
> > > +                             struct list_head *compound_pagelist)
> > >  {
> > >       struct page *src_page, *tmp;
> > >       pte_t *_pte;
> > > -     for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
> > > -                             _pte++, page++, address += PAGE_SIZE) {
> > > -             pte_t pteval = *_pte;
> > > +     pte_t pteval;
> > > +     unsigned long _address;
> > > +     spinlock_t *pmd_ptl;
> > > +     bool copy_succeeded = true;
> > >
> > > -             if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> > > +     /*
> > > +      * Copying pages' contents is subject to memory poison at any iteration.
> > > +      */
> > > +     for (_pte = pte, _address = address;
> > > +                     _pte < pte + HPAGE_PMD_NR;
> > > +                     _pte++, page++, _address += PAGE_SIZE) {
> > > +             pteval = *_pte;
> > > +
> > > +             if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval)))
> > >                       clear_user_highpage(page, address);
> > > -                     add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1);
> > > -                     if (is_zero_pfn(pte_pfn(pteval))) {
> > > -                             /*
> > > -                              * ptl mostly unnecessary.
> > > -                              */
> > > -                             spin_lock(ptl);
> > > -                             ptep_clear(vma->vm_mm, address, _pte);
> > > -                             spin_unlock(ptl);
> > > +             else {
> > > +                     src_page = pte_page(pteval);
> > > +                     if (copy_highpage_mc(page, src_page)) {
> > > +                             copy_succeeded = false;
> > > +                             trace_mm_collapse_huge_page_copy(pte_page(*pte),
> > > +                                             src_page, SCAN_COPY_MC);
> >
> > You seem to assume that if there is an error, it will always happen on
> > the page we are copying from. What if the page we are copying to is the
> > fauly one? Can that happen? Can that be detected by copy_mc_to_kernel?
> > And if so, can that be differentiated?
> It's possible that the copy to page has some uncorrectable memory errors.
>
> Yet only read transactions signals machine check exceptions while
> write transactions do not.
>
> It's the reading from the source page with uncorrectable errors
> causing system panics most (since khugepaged is a kernel context
> thread).
>
>
> Thanks,
> -Jue
> >
> > --
> > Oscar Salvador
> > SUSE Labs