Re: BUG_ON() in pfn_swap_entry_to_page()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2024-04-26 4:49, David Hildenbrand wrote:
On 25.04.24 16:33, Felix Kuehling wrote:


On 2024-04-25 5:32, David Hildenbrand wrote:
On 24.04.24 21:45, Felix Kuehling wrote:
Sorry for top-posting. I'm resurrecting an old thread here because I
think I ran into the same problem with this assertion failing on Linux
6.7:

static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry)
{
          struct page *p = pfn_to_page(swp_offset_pfn(entry));

          /*
           * Any use of migration entries may only occur while the
           * corresponding page is locked
           */
-->     BUG_ON(is_migration_entry(entry) && !PageLocked(p));

          return p;
}

It looks like this thread just fizzled two years ago. Did anything
ever come of this?

Maybe I should add that I saw this in a pre-silicon test environment.
I've never seen this on real hardware. Maybe something timing-sensitive.

In the past, it indicated a swp pte corruption, that would e.g., mess up
the stored PFN ot the swap entry type.

On which call chain do you see that?


This is the backtrace, it's coming from hmm_range_fault. Looks like the
swap entries are from migrated DEVICE_PRIVATE pages.

Thanks, on which kernel version can you reproduce this?

This is on a branch based on v6.7: $ git describe HEAD
v6.7-2677-g065851796b25

The branch mostly changes code in drivers. No changes in kernel/ or mm/. A few changes in include/linux, but nothing that looks related to core memory management.




[Apr 3 20:11] ------------[ cut here ]------------
[  +0.000041] kernel BUG at include/linux/swapops.h:466!
[  +0.000691] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  +0.000342] CPU: 2 PID: 49 Comm: kworker/2:1 Not tainted
6.7.0-kfd-compute-rocm-npi-186 #1
[  +0.000556] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  +0.000703] Workqueue: events amdgpu_irq_handle_ih_soft [amdgpu]
[  +0.000501] RIP: 0010:migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000389] Code: fe ff ff 48 8d 7c 24 07 e8 02 7e f0 ff e9 58 fe ff
ff 48 8b 43 08 a8 01 75 3f 66 90 48 89 d8 48 8b 00 a8 01 0f 85 f1 fd ff
ff <0f> 0b 48 8d 58 ff e9 f7 fd ff ff 48 89 d8 f7 c3 ff 0f 00 00 75 df
[  +0.001161] RSP: 0018:ffffb211c01bb788 EFLAGS: 00010246
[  +0.000339] RAX: 017fff8000080018 RBX: fffff682c40ce8c0 RCX:
0000000000000001
[  +0.000463] RDX: 0000000000000000 RSI: ffff977a45034840 RDI:
000000000000001a
[  +0.000454] RBP: ffff977a45034840 R08: 68000000001033a3 R09:
0000000000000030
[  +0.000451] R10: ffffb211c01bb6a8 R11: 0000000000000001 R12:
ffff977a46bd1318
[  +0.000461] R13: 0000000000000003 R14: 4000000000000000 R15:
ffffb211c01bb9b8
[  +0.000454] FS:  0000000000000000(0000) GS:ffff977dafd00000(0000)
knlGS:0000000000000000
[  +0.000518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000372] CR2: 00007fa2d1cba000 CR3: 00000001030d2004 CR4:
0000000000770ef0
[  +0.000453] PKRU: 55555554
[  +0.000182] Call Trace:
[  +0.000171]  <TASK>
[  +0.000147]  ? die+0x37/0x90
[  +0.000211]  ? do_trap+0xe0/0x110
[  +0.000221]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000351]  ? do_error_trap+0x98/0x120
[  +0.000252]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000346]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000355]  ? exc_invalid_op+0x52/0x70
[  +0.000254]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000345]  ? asm_exc_invalid_op+0x1a/0x20
[  +0.000274]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000361]  ? migration_entry_wait+0x4e/0x160
[  +0.000293]  ? lock_release+0x119/0x260
[  +0.000255]  migration_entry_wait+0x105/0x160
[  +0.000290]  hmm_vma_walk_pmd+0x822/0x8a0
[  +0.000263]  walk_pgd_range+0x40b/0x900
[  +0.000268]  __walk_page_range+0x205/0x220

I wonder if that is coming from pmd_migration_entry_wait() or migration_entry_wait() --  the "?" above adds uncertainty :)

This is weird. I only see a call to pmd_migration_entry_wait in hmm_vma_walk_pmd.


Likely it's from migration_entry_wait().

I was first concerned about the lack of PTL in this function, but migration_entry_wait() will take the PTL and re-read the PTE.

So when we call into migration_entry_wait_on_locked(), we are holding the PTL and we verified that we indeed have a migration entry.

So if we fail in migration_entry_wait_on_locked()->pfn_swap_entry_folio(), we verified under PTL and still have a migration entry.

The referenced folio is indeed not locked then.

I must admit, I'm not familiar with this code at all, so my observations and questions are probably naive. So is the BUG_ON bad, or is migration_entry_wait_on_locked missing some page locking?

I see that migration_entry_wait_on_locked does a folio_trylock_flag(folio, PG_locked, wait), but _after_ getting the folio with page_folio(pfn_swap_entry_to_page(entry)).

Maybe as a workaround for the team stumbling over this, I'll suggest disabling THP.

Regards,
  Felix




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux