On 2024/12/19 20:47, Donet Tom wrote:
The migration selftest is currently failing for shared anonymous mappings due to a race condition. During migration, the source folio's PTE is unmapped by nuking the PTE, flushing the TLB,and then marking the page for migration (by creating the swap entries). The issue arises when, immediately after the PTE is nuked and the TLB is flushed, but before the page is marked for migration, another thread accesses the page. This triggers a page fault, and the page fault handler invokes do_pte_missing() instead of do_swap_page(), as the page is not yet marked for migration. In the fault handling path, do_pte_missing() calls __do_fault() ->shmem_fault() -> shmem_get_folio_gfp() -> filemap_get_entry(). This eventually calls folio_try_get(), incrementing the reference count of the folio undergoing migration. The thread then blocks on folio_lock(), as the migration path holds the lock. This results in the migration failing in __migrate_folio(), which expects the folio's reference count to be 2. However, the reference count is incremented by the fault handler, leading to the failure. The issue arises because, after nuking the PTE and before marking the page for migration, the page is accessed. To address this, we have updated the logic to first nuke the PTE, then mark the page for migration, and only then flush the TLB. With this patch, If the page is accessed immediately after nuking the PTE, the TLB entry is still valid, so no fault occurs. After marking the page for migration,
IMO, I don't think this assumption is correct. At this point, the TLB entry might also be evicted, so a page fault could still occur. It's just a matter of probability.
Additionally, IIUC, if another thread is accessing the shmem folio causing the migration to fail, I think this is expected, and migration failure is not a vital issue?