Re: v5.18-rc1: migratepages triggers VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4 Apr 2022, at 9:29, Naoya Horiguchi wrote:

> Hi,
>
> I found that the below VM_BUG_ON_FOLIO is triggered on v5.18-rc1
> (and also reproducible with mmotm on 3/31).
> I have no idea about the bug's mechanism, but it seems not to be
> shared in LKML yet, so let me just share. config.gz is attached.
>
> This easily reproduces (for example) by calling migratepages(8)
> command by any of running process (like PID 1).
>
> Could anyone help me solve this?
>
> Thanks,
> Naoya Horiguchi
>
> [   48.206424] page:0000000021452e3a refcount:6 mapcount:0 mapping:000000003aaf5253 index:0x0 pfn:0x14e600
> [   48.213316] head:0000000021452e3a order:9 compound_mapcount:0 compound_pincount:0
> [   48.218830] aops:xfs_address_space_operations [xfs] ino:dee dentry name:"libc.so.6"
> [   48.225098] flags: 0x57ffffc0012027(locked|referenced|uptodate|active|private|head|node=1|zone=2|lastcpupid=0x1fffff)
> [   48.232792] raw: 0057ffffc0012027 0000000000000000 dead000000000122 ffff8a0dc9a376b8
> [   48.238464] raw: 0000000000000000 ffff8a0dc6b23d20 00000006ffffffff 0000000000000000
> [   48.244109] page dumped because: VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
> [   48.249196] ------------[ cut here ]------------
> [   48.251240] kernel BUG at mm/memcontrol.c:6857!
> [   48.253896] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> [   48.255377] CPU: 5 PID: 844 Comm: migratepages Tainted: G            E     5.18.0-rc1-v5.18-rc1-220404-1637-000-rc1+ #39
> [   48.258251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
> [   48.260535] RIP: 0010:mem_cgroup_migrate+0x217/0x320
> [   48.261914] Code: 48 89 ef e8 5b 2c f7 ff 0f 0b 48 c7 c6 e8 64 5b b9 48 89 ef e8 4a 2c f7 ff 0f 0b 48 c7 c6 28 65 5b b9 48 89 ef e8 39 2c f7 ff <0f> 0b e8 12 79 e0 ff 49 8b 45 10 a8 03 0f 85 d2 00 00 00 65 48 ff
> [   48.268541] RSP: 0018:ffffa19b41b77a20 EFLAGS: 00010286
> [   48.270245] RAX: 0000000000000045 RBX: 0000000000000200 RCX: 0000000000000000
> [   48.272494] RDX: 0000000000000001 RSI: ffffffffb9599561 RDI: 00000000ffffffff
> [   48.274726] RBP: ffffe30f85398000 R08: 0000000000000000 R09: 00000000ffffdfff
> [   48.276969] R10: ffffa19b41b77810 R11: ffffffffb9940d08 R12: 0000000000000000
> [   48.279136] R13: ffffe30f85398000 R14: ffff8a0dc6b23d20 R15: 0000000000000200
> [   48.281151] FS:  00007fadd1182740(0000) GS:ffff8a0efbc80000(0000) knlGS:0000000000000000
> [   48.283422] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   48.285059] CR2: 00007fadd118b090 CR3: 0000000144432005 CR4: 0000000000170ee0
> [   48.286942] Call Trace:
> [   48.287665]  <TASK>
> [   48.288255]  iomap_migrate_page+0x64/0x190
> [   48.289366]  move_to_new_page+0xa3/0x470
> [   48.290448]  ? page_not_mapped+0xa/0x20
> [   48.291491]  ? rmap_walk_file+0xe1/0x1f0
> [   48.292503]  ? try_to_migrate+0x8e/0xd0
> [   48.293524]  migrate_pages+0x166e/0x1870
> [   48.294607]  ? migrate_page+0xe0/0xe0
> [   48.295761]  ? walk_page_range+0x9a/0x110
> [   48.296885]  migrate_to_node+0xea/0x120
> [   48.297873]  do_migrate_pages+0x23c/0x2a0
> [   48.298925]  kernel_migrate_pages+0x3f5/0x470
> [   48.300149]  __x64_sys_migrate_pages+0x19/0x20
> [   48.301371]  do_syscall_64+0x3b/0x90
> [   48.302340]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [   48.303789] RIP: 0033:0x7fadd0f0af3d
> [   48.304957] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bb ee 0e 00 f7 d8 64 89 01 48
> [   48.310983] RSP: 002b:00007fff5997e178 EFLAGS: 00000246 ORIG_RAX: 0000000000000100
> [   48.313444] RAX: ffffffffffffffda RBX: 0000556a722bf120 RCX: 00007fadd0f0af3d
> [   48.315763] RDX: 0000556a722bf140 RSI: 0000000000000401 RDI: 000000000000034a
> [   48.318070] RBP: 000000000000034a R08: 0000000000000000 R09: 0000000000000003
> [   48.320370] R10: 0000556a722bf1f0 R11: 0000000000000246 R12: 0000556a722bf1d0
> [   48.322679] R13: 000000000000034a R14: 00007fadd11cec00 R15: 0000556a71a59d50
> [   48.324998]  </TASK>

Is it because migration code assumes all THPs have order=HPAGE_PMD_ORDER?
Would the patch below fix the issue?

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a2516d31db6c..358b7c11426d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1209,7 +1209,7 @@ static struct page *new_page(struct page *page, unsigned long start)
                struct page *thp;

                thp = alloc_hugepage_vma(GFP_TRANSHUGE, vma, address,
-                                        HPAGE_PMD_ORDER);
+                                        thp_order(page));
                if (!thp)
                        return NULL;
                prep_transhuge_page(thp);
diff --git a/mm/migrate.c b/mm/migrate.c
index de175e2fdba5..79e4b36f709a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1547,7 +1547,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
                 */
                gfp_mask &= ~__GFP_RECLAIM;
                gfp_mask |= GFP_TRANSHUGE;
-               order = HPAGE_PMD_ORDER;
+               order = thp_order(page);
        }
        zidx = zone_idx(page_zone(page));
        if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)


--
Best Regards,
Yan, Zi

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux