Re: BUG_ON() in pfn_swap_entry_to_page()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2024-04-25 5:32, David Hildenbrand wrote:
On 24.04.24 21:45, Felix Kuehling wrote:
Sorry for top-posting. I'm resurrecting an old thread here because I think I ran into the same problem with this assertion failing on Linux 6.7:

static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry)
{
         struct page *p = pfn_to_page(swp_offset_pfn(entry));

         /*
          * Any use of migration entries may only occur while the
          * corresponding page is locked
          */
-->     BUG_ON(is_migration_entry(entry) && !PageLocked(p));

         return p;
}

It looks like this thread just fizzled two years ago. Did anything ever come of this?

Maybe I should add that I saw this in a pre-silicon test environment. I've never seen this on real hardware. Maybe something timing-sensitive.

In the past, it indicated a swp pte corruption, that would e.g., mess up the stored PFN ot the swap entry type.

On which call chain do you see that?


This is the backtrace, it's coming from hmm_range_fault. Looks like the swap entries are from migrated DEVICE_PRIVATE pages.

[Apr 3 20:11] ------------[ cut here ]------------
[  +0.000041] kernel BUG at include/linux/swapops.h:466!
[  +0.000691] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ +0.000342] CPU: 2 PID: 49 Comm: kworker/2:1 Not tainted 6.7.0-kfd-compute-rocm-npi-186 #1 [ +0.000556] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  +0.000703] Workqueue: events amdgpu_irq_handle_ih_soft [amdgpu]
[  +0.000501] RIP: 0010:migration_entry_wait_on_locked+0x26b/0x2b0
[ +0.000389] Code: fe ff ff 48 8d 7c 24 07 e8 02 7e f0 ff e9 58 fe ff ff 48 8b 43 08 a8 01 75 3f 66 90 48 89 d8 48 8b 00 a8 01 0f 85 f1 fd ff ff <0f> 0b 48 8d 58 ff e9 f7 fd ff ff 48 89 d8 f7 c3 ff 0f 00 00 75 df
[  +0.001161] RSP: 0018:ffffb211c01bb788 EFLAGS: 00010246
[ +0.000339] RAX: 017fff8000080018 RBX: fffff682c40ce8c0 RCX: 0000000000000001 [ +0.000463] RDX: 0000000000000000 RSI: ffff977a45034840 RDI: 000000000000001a [ +0.000454] RBP: ffff977a45034840 R08: 68000000001033a3 R09: 0000000000000030 [ +0.000451] R10: ffffb211c01bb6a8 R11: 0000000000000001 R12: ffff977a46bd1318 [ +0.000461] R13: 0000000000000003 R14: 4000000000000000 R15: ffffb211c01bb9b8 [ +0.000454] FS: 0000000000000000(0000) GS:ffff977dafd00000(0000) knlGS:0000000000000000
[  +0.000518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000372] CR2: 00007fa2d1cba000 CR3: 00000001030d2004 CR4: 0000000000770ef0
[  +0.000453] PKRU: 55555554
[  +0.000182] Call Trace:
[  +0.000171]  <TASK>
[  +0.000147]  ? die+0x37/0x90
[  +0.000211]  ? do_trap+0xe0/0x110
[  +0.000221]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000351]  ? do_error_trap+0x98/0x120
[  +0.000252]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000346]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000355]  ? exc_invalid_op+0x52/0x70
[  +0.000254]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000345]  ? asm_exc_invalid_op+0x1a/0x20
[  +0.000274]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000361]  ? migration_entry_wait+0x4e/0x160
[  +0.000293]  ? lock_release+0x119/0x260
[  +0.000255]  migration_entry_wait+0x105/0x160
[  +0.000290]  hmm_vma_walk_pmd+0x822/0x8a0
[  +0.000263]  walk_pgd_range+0x40b/0x900
[  +0.000268]  __walk_page_range+0x205/0x220
[  +0.000267]  walk_page_range+0x13a/0x250
[  +0.000259]  hmm_range_fault+0x5d/0xb0
[  +0.000247]  amdgpu_hmm_range_get_pages+0x144/0x240 [amdgpu]
[  +0.000491]  svm_range_validate_and_map+0x2e5/0x1310 [amdgpu]
[  +0.000479]  ? svm_migrate_ram_to_vram+0x360/0x630 [amdgpu]
[  +0.000453]  svm_range_restore_pages+0xd1e/0x11b0 [amdgpu]
[  +0.000462]  amdgpu_vm_handle_fault+0xc0/0x370 [amdgpu]
[  +0.000428]  gmc_v9_0_process_interrupt+0x10d/0x670 [amdgpu]
[  +0.000463]  ? __wake_up+0x21/0x60
[  +0.000427]  ? find_held_lock+0x2b/0x80
[  +0.000435]  ? process_one_work+0x16a/0x4b0
[  +0.000446]  ? amdgpu_irq_dispatch+0xc2/0x220 [amdgpu]
[  +0.000596]  amdgpu_irq_dispatch+0xc2/0x220 [amdgpu]
[  +0.000579]  amdgpu_ih_process+0x7d/0xe0 [amdgpu]
[  +0.000561]  process_one_work+0x1d1/0x4b0
[  +0.000435]  worker_thread+0x1d3/0x3d0
[  +0.000400]  ? rescuer_thread+0x360/0x360
[  +0.000410]  kthread+0xee/0x120
[  +0.000367]  ? kthread_complete_and_exit+0x20/0x20
[  +0.000452]  ret_from_fork+0x31/0x50
[  +0.000371]  ? kthread_complete_and_exit+0x20/0x20
[  +0.000448]  ret_from_fork_asm+0x11/0x20
[  +0.000390]  </TASK>
[ +0.000281] Modules linked in: amdgpu drm_ttm_helper ttm video wmi drm_exec drm_suballoc_helper amdxcp drm_buddy gpu_sched drm_display_helper fuse ip_tables x_tables virtio_gpu virtio_dma_buf drm_shmem_helper drm_kms_helper drm drm_panel_orientation_quirks
[  +0.002319] ---[ end trace 0000000000000000 ]---




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux