Sorry for top-posting. I'm resurrecting
an old thread here because I think I ran into the same problem
with this assertion failing on Linux 6.7:
static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry) { struct page *p = pfn_to_page(swp_offset_pfn(entry)); /* * Any use of migration entries may only occur while the * corresponding page is locked */ --> BUG_ON(is_migration_entry(entry) && !PageLocked(p)); return p; }
It looks like this thread just fizzled
two years ago. Did anything ever come of this?
Maybe I should add that I saw this in a
pre-silicon test environment. I've never seen this on real
hardware. Maybe something timing-sensitive.
Regards,
Felix
Felix
On 2022-03-23 23:51, Alistair Popple
wrote:
Matthew Wilcox <willy@xxxxxxxxxxxxx> writes:On Wed, Mar 23, 2022 at 11:29:12AM +1100, Alistair Popple wrote:Matthew Wilcox <willy@xxxxxxxxxxxxx> writes:It's not been reported before, but it seems obvious why it triggers. I think it's ffa65753c431 "mm/migrate.c: rework migration_entry_wait() to not take a pageref".It's obvious the BUG_ON() is because the page isn't locked, but it's less obvious to me at least why we have a migration entry pointing to an unlocked page.There's a lot of inlining going on, so let's write this out: __handle_mm_fault handle_pte_fault do_swap_page(): entry = pte_to_swp_entry(vmf->orig_pte); if (unlikely(non_swap_entry(entry))) { if (is_migration_entry(entry)) { migration_entry_wait(vma->vm_mm, vmf->pmd, vmf->address); We don't (and can't) have the underlying page locked here. We're just waiting for it to finish migration.Why can't the page be locked here? Waiting for migration to finish is equivalent to waiting for the page to be unlocked. __unmap_and_move() follows this rough sequence: __unmap_and_move() if (!trylock_page(page)) { if (!force || mode `= MIGRATE_ASYNC) goto out; /* * It's not safe for direct compaction to call lock_page. * For example, during page readahead pages are added locked * to the LRU. Later, when the IO completes the pages are * marked uptodate and unlocked. However, the queueing * could be merging multiple pages for one bio (e.g. * mpage_readahead). If an allocation happens for the * second or third page, the process can end up locking * the same page twice and deadlocking. Rather than * trying to be clever about what pages can be locked, * avoid the use of lock_page for direct compaction * altogether. */ if (current->flags & PF_MEMALLOC) goto out; lock_page(page); } [...] if (unlikely(!trylock_page(newpage))) goto out_unlock; [...] } else if (page_mapped(page)) { /* Establish migration ptes */ VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma, page); try_to_migrate(page, 0); page_was_mapped = true; } if (!page_mapped(page)) rc = move_to_new_page(newpage, page, mode); if (page_was_mapped) remove_migration_ptes(page, rc =' MIGRATEPAGE_SUCCESS ? newpage : page, false); Both the source and destination pages (page and newpage respectively) are locked prior to installing migration entries in try_to_migrate() and unlocked after migration entries are removed. In order to remove a migration entry the PTL is required. Therefore while holding the PTL for a migration entry it should be impossible to observe it pointing to an unlocked page.Hm, right. We check the PTE initially, then take the PTL in __migration_entry_wait() and then re-check that it's still a migration entry. I can't think of how we're seeing this ...Right. The only thing I can think of is if we somehow didn't find all the migration entries to remove once migration is complete. There have been a few changes to page_vma_mapped_walk() but I haven't spotted anything yet that would cause this.