On Fri, Mar 24, 2023 at 5:39 PM Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > On Fri, 24 Mar 2023, Jiaqi Yan wrote: > > On Fri, Mar 24, 2023 at 2:15 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > > On Sat, Mar 4, 2023 at 10:51 PM Jiaqi Yan <jiaqiyan@xxxxxxxxxx> wrote: > > > > > > > > Make collapse_file roll back when copying pages failed. More concretely: > > > > - extract copying operations into a separate loop > > > > - postpone the updates for nr_none until both scanning and copying > > > > succeeded > > > > - postpone joining small xarray entries until both scanning and copying > > > > succeeded > > > > - postpone the update operations to NR_XXX_THPS until both scanning and > > > > copying succeeded > > > > - for non-SHMEM file, roll back filemap_nr_thps_inc if scan succeeded but > > > > copying failed > > > > > > > > Tested manually: > > > > 0. Enable khugepaged on system under test. Mount tmpfs at /mnt/ramdisk. > > > > 1. Start a two-thread application. Each thread allocates a chunk of > > > > non-huge memory buffer from /mnt/ramdisk. > > > > 2. Pick 4 random buffer address (2 in each thread) and inject > > > > uncorrectable memory errors at physical addresses. > > > > 3. Signal both threads to make their memory buffer collapsible, i.e. > > > > calling madvise(MADV_HUGEPAGE). > > > > 4. Wait and then check kernel log: khugepaged is able to recover from > > > > poisoned pages by skipping them. > > > > 5. Signal both threads to inspect their buffer contents and make sure no > > > > data corruption. > > > > > > > > Signed-off-by: Jiaqi Yan <jiaqiyan@xxxxxxxxxx> > > > > > > Reviewed-by: Yang Shi <shy828301@xxxxxxxxx> > > > > > > Just a nit below: > > Acked-by: Hugh Dickins <hughd@xxxxxxxxxx> > > with a little nit from me below, if you are respinning: > > > > > > > > --- > > > > mm/khugepaged.c | 78 ++++++++++++++++++++++++++++++------------------- > > > > 1 file changed, 48 insertions(+), 30 deletions(-) > > > > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > > index c3c217f6ebc6e..3ea2aa55c2c52 100644 > > > > --- a/mm/khugepaged.c > > > > +++ b/mm/khugepaged.c > > > > @@ -1890,6 +1890,9 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > > > > { > > > > struct address_space *mapping = file->f_mapping; > > > > struct page *hpage; > > > > + struct page *page; > > > > + struct page *tmp; > > > > + struct folio *folio; > > > > pgoff_t index = 0, end = start + HPAGE_PMD_NR; > > > > LIST_HEAD(pagelist); > > > > XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER); > > > > @@ -1934,8 +1937,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > > > > > > > > xas_set(&xas, start); > > > > for (index = start; index < end; index++) { > > > > - struct page *page = xas_next(&xas); > > > > - struct folio *folio; > > > > + page = xas_next(&xas); > > > > > > > > VM_BUG_ON(index != xas.xa_index); > > > > if (is_shmem) { > > > > @@ -2117,10 +2119,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > > > > } > > > > nr = thp_nr_pages(hpage); > > > > > > > > - if (is_shmem) > > > > - __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); > > > > - else { > > > > - __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); > > > > + if (!is_shmem) { > > > > filemap_nr_thps_inc(mapping); > > > > /* > > > > * Paired with smp_mb() in do_dentry_open() to ensure > > That "nr = thp_nr_pages(hpage);" above becomes stranded a long way away > from where "nr" is actually used for updating those statistics: please > move it down with them. (I see "nr" is also reported in the tracepoint > at the end, FWIW, so maybe that will show "0" in more failure cases than > it used to, but that's okay - it has been decently initialized.) > Thanks Hugh! I will make sure V11 moves "nr" closer to the place it is used. > Thanks, > Hugh