On 25.06.24 07:04, Hugh Dickins wrote:
Commit "mm: migrate: split folio_migrate_mapping()" drew attention to "Note, the folio_ref_freeze() is moved out of xas_lock_irq(), Since the folio is already isolated and locked during migration, so suppose that there is no functional change." That was a mistake. Freezing a folio's refcount to 0 is much like taking a spinlock: see how filemap_get_entry() takes rcu_read_lock() then spins around until the folio is unfrozen. If the task freezing is preempted (or calls cond_resched(), as folio_mc_copy() may do), then it risks deadlock: in my case, one CPU in zap_pte_range() with free_swap_and_cache_nr() trying to reclaim swap while PTL is held, all the other CPUs in reclaim spinning for that PTL.
Very much agreed. I think we should add a proper comment to folio_ref_freeze() to spell that out.
-- Cheers, David / dhildenb