On Mon, Apr 24, 2023 at 11:17 AM Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > On Mon, 24 Apr 2023, David Stevens wrote: > > On Sun, Apr 23, 2023 at 1:47 PM Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > > > > > Inserting Ivan Orlov's syzbot fix commit 2ce0bdfebc74 > > > ("mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file()") > > > ahead of Jiaqi Yan's and David Stevens's commits > > > 12904d953364 ("mm/khugepaged: recover from poisoned file-backed memory") > > > cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow") > > > ac492b9c70ca ("mm/khugepaged: skip shmem with userfaultfd") > > > (all of which restructure collapse_file()) did not work out well. > > > > > > xfstests generic/086 on huge tmpfs (with accelerated khugepaged) freezes > > > (if not on the first attempt, then the 2nd or 3rd) in find_lock_entries() > > > while doing drop_caches: the file's xarray seems to have been corrupted, > > > with find_get_entry() returning nonsense which makes no progress. > > > > > > Bisection led to ac492b9c70ca; and diff against earlier working linux-next > > > suggested that it's probably down to an errant xas_store(), which does not > > > belong with the later changes (and nor does the positioning of warnings). > > > The later changes look as if they fix the syzbot issue independently. > > > > > > Remove most of what's left of 2ce0bdfebc74: just leave one WARN_ON_ONCE > > > (xas_error) after the final xas_store() of the multi-index entry. > > > > > > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> > > > --- > > > > > > mm/khugepaged.c | 23 +---------------------- > > > 1 file changed, 1 insertion(+), 22 deletions(-) > > > > > > --- a/mm/khugepaged.c > > > +++ b/mm/khugepaged.c > > > @@ -1941,16 +1941,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, > > > result = SCAN_FAIL; > > > goto xa_locked; > > > } > > > - xas_store(&xas, hpage); > > > - if (xas_error(&xas)) { > > > - /* revert shmem_charge performed > > > - * in the previous condition > > > - */ > > > - mapping->nrpages--; > > > - shmem_uncharge(mapping->host, 1); > > > - result = SCAN_STORE_FAILED; > > > > With this being removed, SCAN_STORE_FAILED should also be removed from > > the scan_result enum and trace event definitions. > > Only if we also remove your use of SCAN_STORE_FAILED in ac492b9c70ca: > what would you want that to say instead? > > I don't care myself for any of those "SCAN" result codes, nor whether they > are few or many: I'd rather have __LINE__ numbers for my own debugging. > > But if people want to remove SCAN_STORE_FAILED now, sure, send a patch; > my intent was to unbreak the breakage. Oh, I had forgotten that I used the code when I rebased v6 of my patches. I have no strong opinions one way or the other. Sorry for the noise. -David