Re: BUG_ON() in pfn_swap_entry_to_page()

Felix Kuehling <felix.kuehling@xxxxxxx> · Wed, 24 Apr 2024 15:45:24 -0400

    Sorry for top-posting. I'm resurrecting
      an old thread here because I think I ran into the same problem
      with this assertion failing on Linux 6.7:

      static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry)
{
        struct page *p = pfn_to_page(swp_offset_pfn(entry));

        /*
         * Any use of migration entries may only occur while the
         * corresponding page is locked
         */
-->     BUG_ON(is_migration_entry(entry) && !PageLocked(p));

        return p;
}

    It looks like this thread just fizzled
      two years ago. Did anything ever come of this?

    Maybe I should add that I saw this in a
      pre-silicon test environment. I've never seen this on real
      hardware. Maybe something timing-sensitive.

    Regards,

        Felix

    On 2022-03-23 23:51, Alistair Popple
      wrote:

      Matthew Wilcox <willy@xxxxxxxxxxxxx> writes:

        On Wed, Mar 23, 2022 at 11:29:12AM +1100, Alistair Popple wrote:

          Matthew Wilcox <willy@xxxxxxxxxxxxx> writes:

            It's not been reported before, but it seems obvious why it triggers.
I think it's ffa65753c431 "mm/migrate.c: rework migration_entry_wait()
to not take a pageref".

          It's obvious the BUG_ON() is because the page isn't locked, but it's less
obvious to me at least why we have a migration entry pointing to an unlocked
page.

            There's a lot of inlining going on, so let's write this out:

__handle_mm_fault
  handle_pte_fault
    do_swap_page():

        entry = pte_to_swp_entry(vmf->orig_pte);
        if (unlikely(non_swap_entry(entry))) {
                if (is_migration_entry(entry)) {
                        migration_entry_wait(vma->vm_mm, vmf->pmd,
                                             vmf->address);

We don't (and can't) have the underlying page locked here.  We're just
waiting for it to finish migration.

          Why can't the page be locked here? Waiting for migration to finish is equivalent
to waiting for the page to be unlocked. __unmap_and_move() follows this rough
sequence:

__unmap_and_move()
	if (!trylock_page(page)) {
		if (!force || mode `= MIGRATE_ASYNC)
			goto out;

/*
* It's not safe for direct compaction to call lock_page.
* For example, during page readahead pages are added locked
* to the LRU. Later, when the IO completes the pages are
* marked uptodate and unlocked. However, the queueing
* could be merging multiple pages for one bio (e.g.
* mpage_readahead). If an allocation happens for the
* second or third page, the process can end up locking
* the same page twice and deadlocking. Rather than
* trying to be clever about what pages can be locked,
* avoid the use of lock_page for direct compaction
* altogether.
 */
if (current->flags & PF_MEMALLOC)
    goto out;

    lock_page(page);
}

[...]

if (unlikely(!trylock_page(newpage)))
    goto out_unlock;

[...]

} else if (page_mapped(page)) {
    /* Establish migration ptes */
    VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma,
            page);
    try_to_migrate(page, 0);
    page_was_mapped = true;
}

if (!page_mapped(page))
    rc = move_to_new_page(newpage, page, mode);

if (page_was_mapped)
    remove_migration_ptes(page,
        rc =' MIGRATEPAGE_SUCCESS ? newpage : page, false);

Both the source and destination pages (page and newpage respectively) are locked
prior to installing migration entries in try_to_migrate() and unlocked after
migration entries are removed. In order to remove a migration entry the PTL is
required. Therefore while holding the PTL for a migration entry it should be
impossible to observe it pointing to an unlocked page.

        Hm, right.  We check the PTE initially, then take the PTL in
__migration_entry_wait() and then re-check that it's still a migration
entry.  I can't think of how we're seeing this ...

      Right. The only thing I can think of is if we somehow didn't find all the
migration entries to remove once migration is complete. There have been a few
changes to page_vma_mapped_walk() but I haven't spotted anything yet that would
cause this.