Re: [PATCH v2 0/9] support large folio swap-out and swap-in for shmem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20.06.24 18:27, Hugh Dickins wrote:
On Thu, 20 Jun 2024, David Hildenbrand wrote:


(I do have doubts about Barry's: the "_new" in folio_add_new_anon_rmap()
was all about optimizing a known-exclusive case, so it surprises me
to see it being extended to non-exclusive; and I worry over how its
atomic_set(&page->_mapcount, 0)s can be safe when non-exclusive (but
I've never caught up with David's exclusive changes, I'm out of date).

We discussed that a while ago: if we wouldn't be holding the folio lock in the
"folio == swapcache" at that point (which we do for both do_swap_page() and
unuse_pte()) things would already be pretty broken.

You're thinking of the non-atomic-ness of atomic_set(): I agree that
the folio lock makes that safe (against other adds; but not against
concurrent removes, which could never occur with the old "_new" usage).

But what I'm worried about is the 0 in atomic_set(&page->_mapcount, 0):
once the folio lock has been dropped, another non-exclusive add could
come in and set _mapcount again to 0 instead of to 1 (mapcount 1 when
it should be 2)?

That would mean that we have someone checking folio_test_anon() before doing the folio_add_new*, and *not* performing that check under folio lock such that we can race with another folio_add_new*() call. (or not checking for folio_test_anon() at all, which would be similarly broken)

When thinking about this in the past I concluded that such code would be inherently racy and wrong already and the existing VM_WARN_ON_FOLIO() would have revealed that race.

Whoever calls folio_add_new*() either (a) just allocated that thing and can be sure that nobody else does something concurrently -- no need for the folio lock; or (b) calls it only when !folio_test_anon(), and synchronizes against any other possible races using the folio lock.

swapin code falls into category (b), everything else we have in the tree into category (a).

So far my thinking :)



That's I added a while ago:

if (unlikely(!folio_test_anon(folio))) {
	VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
	/*
	 * For a PTE-mapped large folio, we only know that the single
	 * PTE is exclusive. Further, __folio_set_anon() might not get
	 * folio->index right when not given the address of the head
	 * page.
	 */
	...

We should probably move that VM_WARN_ON_FOLIO() to folio_add_new_anon_rmap()
and document that it's required in the non-exclusive case.


But even if those are wrong, I'd expect them to tend towards a mapped
page becoming unreclaimable, then "Bad page map" when munmapped,
not to any of the double-free symptoms I've actually seen.)

What's the first known-good commit?

I cannot answer that with any certainty: we're on the shifting sands
of next and mm-unstable, and the bug itself is looking rather like
something which gets amplified or masked by other changes - witness
my confident arrival at Barry's series as introducing the badness,
only for a longer run then to contradict that conclusion.

There was no sign of a problem in a 20-hour run of the same load on
rc3-based next-2024-06-13 (plus my posted fixes); there has been no
sign of this problem on 6.10-rc1, rc2, rc3 (but I've not tried rc4
itself, mm.git needing the more urgent attention). mm-everything-
2024-06-15 (minus Chris's mTHP swap) did not show this problem, but
did show filemap_get_folio() hang. mm-everything-2024-06-18 (minus
Baolin's mTHP shmem swap) is where I'm hunting it.

Good, as long as we suspect only something  very recent is affected.

There was this report against rc3 upstream:

https://lore.kernel.org/all/CA+G9fYvKmr84WzTArmfaypKM9+=Aw0uXCtuUKHQKFCNMGJyOgQ@xxxxxxxxxxxxxx/

It's only been observed once, and who knows what's happening in that NFS setup. I suspected NUMA hinting code, but I am not sure if that really explains the issue ...

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux