On Thu, 20 Jun 2024, David Hildenbrand wrote: > > > > (I do have doubts about Barry's: the "_new" in folio_add_new_anon_rmap() > > was all about optimizing a known-exclusive case, so it surprises me > > to see it being extended to non-exclusive; and I worry over how its > > atomic_set(&page->_mapcount, 0)s can be safe when non-exclusive (but > > I've never caught up with David's exclusive changes, I'm out of date). > > We discussed that a while ago: if we wouldn't be holding the folio lock in the > "folio == swapcache" at that point (which we do for both do_swap_page() and > unuse_pte()) things would already be pretty broken. You're thinking of the non-atomic-ness of atomic_set(): I agree that the folio lock makes that safe (against other adds; but not against concurrent removes, which could never occur with the old "_new" usage). But what I'm worried about is the 0 in atomic_set(&page->_mapcount, 0): once the folio lock has been dropped, another non-exclusive add could come in and set _mapcount again to 0 instead of to 1 (mapcount 1 when it should be 2)? > > That's I added a while ago: > > if (unlikely(!folio_test_anon(folio))) { > VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); > /* > * For a PTE-mapped large folio, we only know that the single > * PTE is exclusive. Further, __folio_set_anon() might not get > * folio->index right when not given the address of the head > * page. > */ > ... > > We should probably move that VM_WARN_ON_FOLIO() to folio_add_new_anon_rmap() > and document that it's required in the non-exclusive case. > > > > > But even if those are wrong, I'd expect them to tend towards a mapped > > page becoming unreclaimable, then "Bad page map" when munmapped, > > not to any of the double-free symptoms I've actually seen.) > > What's the first known-good commit? I cannot answer that with any certainty: we're on the shifting sands of next and mm-unstable, and the bug itself is looking rather like something which gets amplified or masked by other changes - witness my confident arrival at Barry's series as introducing the badness, only for a longer run then to contradict that conclusion. There was no sign of a problem in a 20-hour run of the same load on rc3-based next-2024-06-13 (plus my posted fixes); there has been no sign of this problem on 6.10-rc1, rc2, rc3 (but I've not tried rc4 itself, mm.git needing the more urgent attention). mm-everything- 2024-06-15 (minus Chris's mTHP swap) did not show this problem, but did show filemap_get_folio() hang. mm-everything-2024-06-18 (minus Baolin's mTHP shmem swap) is where I'm hunting it. Hugh