Matthew Wilcox <willy@xxxxxxxxxxxxx> writes: > On Tue, Mar 22, 2022 at 06:25:02PM +0100, Sebastian Andrzej Siewior wrote: >> Run into >> >> |kernel BUG at include/linux/swapops.h:258! >> |RIP: 0010:migration_entry_wait_on_locked+0x233/0x2c0 >> |Call Trace: >> | <TASK> >> | __handle_mm_fault+0x2bf/0x1810 >> | handle_mm_fault+0x136/0x3e0 >> | exc_page_fault+0x1d2/0x850 >> | asm_exc_page_fault+0x22/0x30 >> >> This is the BUG_ON() in pfn_swap_entry_to_page(). The box itself has no >> swapspace configured. Is this something known? > > It's not been reported before, but it seems obvious why it triggers. > I think it's ffa65753c431 "mm/migrate.c: rework migration_entry_wait() > to not take a pageref". It's obvious the BUG_ON() is because the page isn't locked, but it's less obvious to me at least why we have a migration entry pointing to an unlocked page. > There's a lot of inlining going on, so let's write this out: > > __handle_mm_fault > handle_pte_fault > do_swap_page(): > > entry = pte_to_swp_entry(vmf->orig_pte); > if (unlikely(non_swap_entry(entry))) { > if (is_migration_entry(entry)) { > migration_entry_wait(vma->vm_mm, vmf->pmd, > vmf->address); > > We don't (and can't) have the underlying page locked here. We're just > waiting for it to finish migration. Why can't the page be locked here? Waiting for migration to finish is equivalent to waiting for the page to be unlocked. __unmap_and_move() follows this rough sequence: __unmap_and_move() if (!trylock_page(page)) { if (!force || mode == MIGRATE_ASYNC) goto out; /* * It's not safe for direct compaction to call lock_page. * For example, during page readahead pages are added locked * to the LRU. Later, when the IO completes the pages are * marked uptodate and unlocked. However, the queueing * could be merging multiple pages for one bio (e.g. * mpage_readahead). If an allocation happens for the * second or third page, the process can end up locking * the same page twice and deadlocking. Rather than * trying to be clever about what pages can be locked, * avoid the use of lock_page for direct compaction * altogether. */ if (current->flags & PF_MEMALLOC) goto out; lock_page(page); } [...] if (unlikely(!trylock_page(newpage))) goto out_unlock; [...] } else if (page_mapped(page)) { /* Establish migration ptes */ VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma, page); try_to_migrate(page, 0); page_was_mapped = true; } if (!page_mapped(page)) rc = move_to_new_page(newpage, page, mode); if (page_was_mapped) remove_migration_ptes(page, rc == MIGRATEPAGE_SUCCESS ? newpage : page, false); Both the source and destination pages (page and newpage respectively) are locked prior to installing migration entries in try_to_migrate() and unlocked after migration entries are removed. In order to remove a migration entry the PTL is required. Therefore while holding the PTL for a migration entry it should be impossible to observe it pointing to an unlocked page. > migration_entry_wait > migration_entry_wait_on_locked(): > > struct folio *folio = page_folio(pfn_swap_entry_to_page(entry)); > > pfn_swap_entry_to_page(): > > struct page *p = pfn_to_page(swp_offset(entry)); > > /* > * Any use of migration entries may only occur while the > * corresponding page is locked > */ > BUG_ON(is_migration_entry(entry) && !PageLocked(p)); > > So why did we not hit this earlier? Surely somebody's testing page > migration? I'm not familiar with the migration code, so maybe this > assertion is obsolete and it's now legitimate to use a migration entry > while the page is unlocked.