On Tue, May 24, 2022 at 7:05 AM Jue Wang <juew@xxxxxxxxxx> wrote: > > On Tue, May 24, 2022 at 3:48 AM Oscar Salvador <osalvador@xxxxxxx> wrote: > > > > On Mon, May 23, 2022 at 07:53:51PM -0700, Jiaqi Yan wrote: > > > Make __collapse_huge_page_copy return whether > > > collapsing/copying anonymous pages succeeded, > > > and make collapse_huge_page handle the return status. > > > > > > Break existing PTE scan loop into two for-loops. > > > The first loop copies source pages into target huge page, > > > and can fail gracefully when running into memory errors in > > > source pages. Roll back the page table and page states > > > when copying failed: > > > 1) re-establish the PTEs-to-PMD connection. > > > 2) release pages back to their LRU list. > > > > If you spell out what the first loop does, just tell > > what the second loop does as wel, it just gets easier. > > Thanks for the suggestion. I will amend the commit msg in the next version as follows: If copying all pages succeeds, the second loop releases and clears up these normal pages. Otherwise, the second loop does the followings to roll back the page table and page states: 1) re-establish the original PTEs-to-PMD connection. 2) release source pages back to their LRU list. > > > > +static bool __collapse_huge_page_copy(pte_t *pte, > > > + struct page *page, > > > + pmd_t *pmd, > > > + pmd_t rollback, > > > + struct vm_area_struct *vma, > > > + unsigned long address, > > > + spinlock_t *pte_ptl, > > > + struct list_head *compound_pagelist) > > > { > > > struct page *src_page, *tmp; > > > pte_t *_pte; > > > - for (_pte = pte; _pte < pte + HPAGE_PMD_NR; > > > - _pte++, page++, address += PAGE_SIZE) { > > > - pte_t pteval = *_pte; > > > + pte_t pteval; > > > + unsigned long _address; > > > + spinlock_t *pmd_ptl; > > > + bool copy_succeeded = true; > > > > > > - if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { > > > + /* > > > + * Copying pages' contents is subject to memory poison at any iteration. > > > + */ > > > + for (_pte = pte, _address = address; > > > + _pte < pte + HPAGE_PMD_NR; > > > + _pte++, page++, _address += PAGE_SIZE) { > > > + pteval = *_pte; > > > + > > > + if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) > > > clear_user_highpage(page, address); > > > - add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1); > > > - if (is_zero_pfn(pte_pfn(pteval))) { > > > - /* > > > - * ptl mostly unnecessary. > > > - */ > > > - spin_lock(ptl); > > > - ptep_clear(vma->vm_mm, address, _pte); > > > - spin_unlock(ptl); > > > + else { > > > + src_page = pte_page(pteval); > > > + if (copy_highpage_mc(page, src_page)) { > > > + copy_succeeded = false; > > > + trace_mm_collapse_huge_page_copy(pte_page(*pte), > > > + src_page, SCAN_COPY_MC); > > > > You seem to assume that if there is an error, it will always happen on > > the page we are copying from. What if the page we are copying to is the > > fauly one? Can that happen? Can that be detected by copy_mc_to_kernel? > > And if so, can that be differentiated? > It's possible that the copy to page has some uncorrectable memory errors. > > Yet only read transactions signals machine check exceptions while > write transactions do not. > > It's the reading from the source page with uncorrectable errors > causing system panics most (since khugepaged is a kernel context > thread). > > > Thanks, > -Jue > > > > -- > > Oscar Salvador > > SUSE Labs