On Sun, Apr 23, 2023 at 1:47 PM Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > Inserting Ivan Orlov's syzbot fix commit 2ce0bdfebc74 > ("mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file()") > ahead of Jiaqi Yan's and David Stevens's commits > 12904d953364 ("mm/khugepaged: recover from poisoned file-backed memory") > cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow") > ac492b9c70ca ("mm/khugepaged: skip shmem with userfaultfd") > (all of which restructure collapse_file()) did not work out well. > > xfstests generic/086 on huge tmpfs (with accelerated khugepaged) freezes > (if not on the first attempt, then the 2nd or 3rd) in find_lock_entries() > while doing drop_caches: the file's xarray seems to have been corrupted, > with find_get_entry() returning nonsense which makes no progress. > > Bisection led to ac492b9c70ca; and diff against earlier working linux-next > suggested that it's probably down to an errant xas_store(), which does not > belong with the later changes (and nor does the positioning of warnings). > The later changes look as if they fix the syzbot issue independently. > > Remove most of what's left of 2ce0bdfebc74: just leave one WARN_ON_ONCE > (xas_error) after the final xas_store() of the multi-index entry. > > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> > --- > > mm/khugepaged.c | 23 +---------------------- > 1 file changed, 1 insertion(+), 22 deletions(-) > > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1941,16 +1941,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > result = SCAN_FAIL; > goto xa_locked; > } > - xas_store(&xas, hpage); > - if (xas_error(&xas)) { > - /* revert shmem_charge performed > - * in the previous condition > - */ > - mapping->nrpages--; > - shmem_uncharge(mapping->host, 1); > - result = SCAN_STORE_FAILED; With this being removed, SCAN_STORE_FAILED should also be removed from the scan_result enum and trace event definitions. -David > - goto xa_locked; > - } > nr_none++; > continue; > } > @@ -2105,13 +2095,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > * Accumulate the pages that are being collapsed. > */ > list_add_tail(&page->lru, &pagelist); > - > - /* > - * We can't get an ENOMEM here (because the allocation happened > - * before) but let's check for errors (XArray implementation > - * can be changed in the future) > - */ > - WARN_ON_ONCE(xas_error(&xas)); > continue; > out_unlock: > unlock_page(page); > @@ -2134,11 +2117,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > } > } > > - /* Here we can't get an ENOMEM (because entries were > - * previously allocated) But let's check for errors > - * (XArray implementation can be changed in the future) > - */ > - WARN_ON_ONCE(xas_error(&xas)); > xa_locked: > xas_unlock_irq(&xas); > xa_unlocked: > @@ -2259,6 +2237,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > /* Join all the small entries into a single multi-index entry. */ > xas_set_order(&xas, start, HPAGE_PMD_ORDER); > xas_store(&xas, hpage); > + WARN_ON_ONCE(xas_error(&xas)); > xas_unlock_irq(&xas); > > /*