Re: [PATCH hotfix] mm: fix crashes from deferred split racing folio migration

Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> · Wed, 3 Jul 2024 09:51:28 +0800

On 2024/7/3 00:15, Hugh Dickins wrote:
On Tue, 2 Jul 2024, Baolin Wang wrote:
On 2024/7/2 15:40, Hugh Dickins wrote:
Even on 6.10-rc6, I've been seeing elusive "Bad page state"s (often on
flags when freeing, yet the flags shown are not bad: PG_locked had been
set and cleared??), and VM_BUG_ON_PAGE(page_ref_count(page) == 0)s from
deferred_split_scan()'s folio_put(), and a variety of other BUG and WARN
symptoms implying double free by deferred split and large folio migration.

6.7 commit 9bcef5973e31 ("mm: memcg: fix split queue list crash when large
folio migration") was right to fix the memcg-dependent locking broken in
85ce2c517ade ("memcontrol: only transfer the memcg data for migration"),
but missed a subtlety of deferred_split_scan(): it moves folios to its own
local list to work on them without split_queue_lock, during which time
folio->_deferred_list is not empty, but even the "right" lock does nothing
to secure the folio and the list it is on.

Fortunately, deferred_split_scan() is careful to use folio_try_get(): so
folio_migrate_mapping() can avoid the race by folio_undo_large_rmappable()
while the old folio's reference count is temporarily frozen to 0 - adding
such a freeze in the !mapping case too (originally, folio lock and

(I should have added "isolation and" into that list of conditions.)

unmapping and no swap cache left an anon folio unreachable, so no freezing
was needed there: but the deferred split queue offers a way to reach it).

Thanks Hugh.

But after reading your analysis, I am concerned that the
folio_undo_large_rmappable() and deferred_split_scan() may still encounter a
race condition with the local list, even with your patch.

Suppose folio A has already been queued into the local list in
deferred_split_scan() by thread A, but fails to 'folio_trylock' and then
releases the reference count. At the same time, folio A can be frozen by
another thread B in folio_migrate_mapping(). In such a case,
folio_undo_large_rmappable() would remove folio A from the local list without
*any* lock protection, creating a race condition with the local list iteration
in deferred_split_scan().

It's a good doubt to raise, but I think we're okay: because Kirill
designed the deferred split list (and its temporary local list) to
be safe in that way.

You're right that if thread B's freeze to 0 wins the race, thread B
will be messing with a list on thread A's stack while thread A is
quite possibly running; but thread A will not leave that stack frame
without taking again the split_queue_lock which thread B holds while
removing from the list.

We would certainly not want to introduce such a subtlety right now!
But never mind page migration, this is how it has always worked when
racing with the folio being freed at the same time - maybe
deferred_split_scan() wins the freeing race and is the one to remove
folio from deferred split list, or maybe the other freer does that.

Yes, thanks for explanation. And after thinking more, the 
'list_for_each_entry_safe' in deferred_split_scan() can maintain the 
list iteration safety, so I think this is safe.

I forget whether there was an initial flurry of races to be fixed when
it came in, but it has been stable against concurrent freeing for years.

Please think this over again: do not trust my honeyed words!

Anyway, I think this patch can still fix some possible races. Feel free to
add:
Reviewed-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>

Thanks, but I certainly don't want this to go into the tree if it's
still flawed as you suggest.

Now I have no doubt for this fix, and please continue to keep my 
Reviewed-by tag, thanks.