On 4 Apr 2022, at 9:29, Naoya Horiguchi wrote: > Hi, > > I found that the below VM_BUG_ON_FOLIO is triggered on v5.18-rc1 > (and also reproducible with mmotm on 3/31). > I have no idea about the bug's mechanism, but it seems not to be > shared in LKML yet, so let me just share. config.gz is attached. > > This easily reproduces (for example) by calling migratepages(8) > command by any of running process (like PID 1). > > Could anyone help me solve this? > > Thanks, > Naoya Horiguchi > > [ 48.206424] page:0000000021452e3a refcount:6 mapcount:0 mapping:000000003aaf5253 index:0x0 pfn:0x14e600 > [ 48.213316] head:0000000021452e3a order:9 compound_mapcount:0 compound_pincount:0 > [ 48.218830] aops:xfs_address_space_operations [xfs] ino:dee dentry name:"libc.so.6" > [ 48.225098] flags: 0x57ffffc0012027(locked|referenced|uptodate|active|private|head|node=1|zone=2|lastcpupid=0x1fffff) > [ 48.232792] raw: 0057ffffc0012027 0000000000000000 dead000000000122 ffff8a0dc9a376b8 > [ 48.238464] raw: 0000000000000000 ffff8a0dc6b23d20 00000006ffffffff 0000000000000000 > [ 48.244109] page dumped because: VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages) > [ 48.249196] ------------[ cut here ]------------ > [ 48.251240] kernel BUG at mm/memcontrol.c:6857! > [ 48.253896] invalid opcode: 0000 [#1] PREEMPT SMP PTI > [ 48.255377] CPU: 5 PID: 844 Comm: migratepages Tainted: G E 5.18.0-rc1-v5.18-rc1-220404-1637-000-rc1+ #39 > [ 48.258251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014 > [ 48.260535] RIP: 0010:mem_cgroup_migrate+0x217/0x320 > [ 48.261914] Code: 48 89 ef e8 5b 2c f7 ff 0f 0b 48 c7 c6 e8 64 5b b9 48 89 ef e8 4a 2c f7 ff 0f 0b 48 c7 c6 28 65 5b b9 48 89 ef e8 39 2c f7 ff <0f> 0b e8 12 79 e0 ff 49 8b 45 10 a8 03 0f 85 d2 00 00 00 65 48 ff > [ 48.268541] RSP: 0018:ffffa19b41b77a20 EFLAGS: 00010286 > [ 48.270245] RAX: 0000000000000045 RBX: 0000000000000200 RCX: 0000000000000000 > [ 48.272494] RDX: 0000000000000001 RSI: ffffffffb9599561 RDI: 00000000ffffffff > [ 48.274726] RBP: ffffe30f85398000 R08: 0000000000000000 R09: 00000000ffffdfff > [ 48.276969] R10: ffffa19b41b77810 R11: ffffffffb9940d08 R12: 0000000000000000 > [ 48.279136] R13: ffffe30f85398000 R14: ffff8a0dc6b23d20 R15: 0000000000000200 > [ 48.281151] FS: 00007fadd1182740(0000) GS:ffff8a0efbc80000(0000) knlGS:0000000000000000 > [ 48.283422] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 48.285059] CR2: 00007fadd118b090 CR3: 0000000144432005 CR4: 0000000000170ee0 > [ 48.286942] Call Trace: > [ 48.287665] <TASK> > [ 48.288255] iomap_migrate_page+0x64/0x190 > [ 48.289366] move_to_new_page+0xa3/0x470 > [ 48.290448] ? page_not_mapped+0xa/0x20 > [ 48.291491] ? rmap_walk_file+0xe1/0x1f0 > [ 48.292503] ? try_to_migrate+0x8e/0xd0 > [ 48.293524] migrate_pages+0x166e/0x1870 > [ 48.294607] ? migrate_page+0xe0/0xe0 > [ 48.295761] ? walk_page_range+0x9a/0x110 > [ 48.296885] migrate_to_node+0xea/0x120 > [ 48.297873] do_migrate_pages+0x23c/0x2a0 > [ 48.298925] kernel_migrate_pages+0x3f5/0x470 > [ 48.300149] __x64_sys_migrate_pages+0x19/0x20 > [ 48.301371] do_syscall_64+0x3b/0x90 > [ 48.302340] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 48.303789] RIP: 0033:0x7fadd0f0af3d > [ 48.304957] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bb ee 0e 00 f7 d8 64 89 01 48 > [ 48.310983] RSP: 002b:00007fff5997e178 EFLAGS: 00000246 ORIG_RAX: 0000000000000100 > [ 48.313444] RAX: ffffffffffffffda RBX: 0000556a722bf120 RCX: 00007fadd0f0af3d > [ 48.315763] RDX: 0000556a722bf140 RSI: 0000000000000401 RDI: 000000000000034a > [ 48.318070] RBP: 000000000000034a R08: 0000000000000000 R09: 0000000000000003 > [ 48.320370] R10: 0000556a722bf1f0 R11: 0000000000000246 R12: 0000556a722bf1d0 > [ 48.322679] R13: 000000000000034a R14: 00007fadd11cec00 R15: 0000556a71a59d50 > [ 48.324998] </TASK> Is it because migration code assumes all THPs have order=HPAGE_PMD_ORDER? Would the patch below fix the issue? diff --git a/mm/mempolicy.c b/mm/mempolicy.c index a2516d31db6c..358b7c11426d 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1209,7 +1209,7 @@ static struct page *new_page(struct page *page, unsigned long start) struct page *thp; thp = alloc_hugepage_vma(GFP_TRANSHUGE, vma, address, - HPAGE_PMD_ORDER); + thp_order(page)); if (!thp) return NULL; prep_transhuge_page(thp); diff --git a/mm/migrate.c b/mm/migrate.c index de175e2fdba5..79e4b36f709a 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1547,7 +1547,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private) */ gfp_mask &= ~__GFP_RECLAIM; gfp_mask |= GFP_TRANSHUGE; - order = HPAGE_PMD_ORDER; + order = thp_order(page); } zidx = zone_idx(page_zone(page)); if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE) -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature