On 09/03/2024 04:52, Matthew Wilcox wrote: > On Fri, Mar 08, 2024 at 08:34:15PM -0800, Andrew Morton wrote: >> >> We seem to be coming down to the wire on this one - Linus might release >> 6.8 this weekend. >> >> Will simply dropping "mm: allow non-hugetlb large folios to be batch >> processed" from mm-stable get us out of trouble? > > We can add a fix patch which re-narrows the race to the point where it's > no longer observable. Obviously we need to figure out what the real > problem is, but we could be going back a long way. We've definitely > found two bugs in the process of investigating the problem (of arguable > import; the migration one merely wastes memory temporarily and it's not > entirely clear that the wrong-lock problem definitely causes a crash) > > diff --git a/mm/swap.c b/mm/swap.c > index 6b697d33fa5b..7b1d3144391b 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -1012,6 +1012,8 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) > free_huge_folio(folio); > continue; > } > + if (folio_test_large(folio) && folio_test_large_rmappable(folio)) > + folio_undo_large_rmappable(folio); > > __page_cache_release(folio, &lruvec, &flags); > I agree this is likely to re-hide the problems. But I haven't actually tested it on it's own without the other fixes. I'll do some more testing with your latest patch and if that doesn't lead anywhere, I'll test with this on its own to check that I can no longer reproduce the crashes. If it hides them, I think this is the best short-term solution we have right now.