On Sat, 22 Apr 2023 21:47:20 -0700 (PDT) Hugh Dickins <hughd@xxxxxxxxxx> wrote: > Inserting Ivan Orlov's syzbot fix commit 2ce0bdfebc74 > ("mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file()") > ahead of Jiaqi Yan's and David Stevens's commits > 12904d953364 ("mm/khugepaged: recover from poisoned file-backed memory") > cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow") > ac492b9c70ca ("mm/khugepaged: skip shmem with userfaultfd") > (all of which restructure collapse_file()) did not work out well. > > xfstests generic/086 on huge tmpfs (with accelerated khugepaged) freezes > (if not on the first attempt, then the 2nd or 3rd) in find_lock_entries() > while doing drop_caches: the file's xarray seems to have been corrupted, > with find_get_entry() returning nonsense which makes no progress. > > Bisection led to ac492b9c70ca; and diff against earlier working linux-next > suggested that it's probably down to an errant xas_store(), which does not > belong with the later changes (and nor does the positioning of warnings). > The later changes look as if they fix the syzbot issue independently. > > Remove most of what's left of 2ce0bdfebc74: just leave one WARN_ON_ONCE > (xas_error) after the final xas_store() of the multi-index entry. > Sigh. Thanks. I thought I'd successfully sorted that mess out.