On 24.10.24 06:10, Hugh Dickins wrote:
Recent changes are putting more pressure on THP deferred split queues:
under load revealing long-standing races, causing list_del corruptions,
"Bad page state"s and worse (I keep BUGs in both of those, so usually
don't get to see how badly they end up without). The relevant recent
changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin,
improved swap allocation, and underused THP splitting.
The new unlocked list_del_init() in deferred_split_scan() is buggy.
I gave bad advice, it looks plausible since that's a local on-stack
list, but the fact is that it can race with a third party freeing or
migrating the preceding folio (properly unqueueing it with refcount 0
while holding split_queue_lock), thereby corrupting the list linkage.
The obvious answer would be to take split_queue_lock there: but it has
a long history of contention, so I'm reluctant to add to that. Instead,
make sure that there is always one safe (raised refcount) folio before,
by delaying its folio_put(). (And of course I was wrong to suggest
updating split_queue_len without the lock: leave that until the splice.)
And remove two over-eager partially_mapped checks, restoring those tests
to how they were before: if uncharge_folio() or free_tail_page_prepare()
finds _deferred_list non-empty, it's in trouble whether or not that folio
is partially_mapped (and the flag was already cleared in the latter case).
Fixes: dafff3f4c850 ("mm: split underused THPs")
Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
---
Challenging problem and elegant fix!
Reviewed-by: David Hildenbrand <david@xxxxxxxxxx>
At some point you have to tell me your secrets, how you are able to
trigger all these hard-to-trigger problems (6.8 stuff and no reports on
the list) :)
--
Cheers,
David / dhildenb