On 09/15/23 10:16, Johannes Weiner wrote: > On Thu, Sep 14, 2023 at 04:52:38PM -0700, Mike Kravetz wrote: > > In next-20230913, I started hitting the following BUG. Seems related > > to this series. And, if series is reverted I do not see the BUG. > > > > I can easily reproduce on a small 16G VM. kernel command line contains > > "hugetlb_free_vmemmap=on hugetlb_cma=4G". Then run the script, > > while true; do > > echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > > echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/demote > > echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages > > done > > > > For the BUG below I believe it was the first (or second) 1G page creation from > > CMA that triggered: cma_alloc of 1G. > > > > Sorry, have not looked deeper into the issue. > > Thanks for the report, and sorry about the breakage! > > I was scratching my head at this: > > /* MIGRATE_ISOLATE page should not go to pcplists */ > VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); > > because there is nothing in page isolation that prevents setting > MIGRATE_ISOLATE on something that's on the pcplist already. So why > didn't this trigger before already? > > Then it clicked: it used to only check the *pcpmigratetype* determined > by free_unref_page(), which of course mustn't be MIGRATE_ISOLATE. > > Pages that get isolated while *already* on the pcplist are fine, and > are handled properly: > > mt = get_pcppage_migratetype(page); > > /* MIGRATE_ISOLATE page should not go to pcplists */ > VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); > > /* Pageblock could have been isolated meanwhile */ > if (unlikely(isolated_pageblocks)) > mt = get_pageblock_migratetype(page); > > So this was purely a sanity check against the pcpmigratetype cache > operations. With that gone, we can remove it. With the patch below applied, a slightly different workload triggers the following warnings. It seems related, and appears to go away when reverting the series. [ 331.595382] ------------[ cut here ]------------ [ 331.596665] page type is 5, passed migratetype is 1 (nr=512) [ 331.598121] WARNING: CPU: 2 PID: 935 at mm/page_alloc.c:662 expand+0x1c9/0x200 [ 331.600549] Modules linked in: rfkill ip6table_filter ip6_tables sunrpc snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_seq 9p snd_seq_device netfs 9pnet_virtio snd_pcm joydev snd_timer virtio_balloon snd soundcore 9pnet virtio_blk virtio_console virtio_net net_failover failover crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring fuse [ 331.609530] CPU: 2 PID: 935 Comm: bash Tainted: G W 6.6.0-rc1-next-20230913+ #26 [ 331.611603] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014 [ 331.613527] RIP: 0010:expand+0x1c9/0x200 [ 331.614492] Code: 89 ef be 07 00 00 00 c6 05 c9 b1 35 01 01 e8 de f7 ff ff 8b 4c 24 30 8b 54 24 0c 48 c7 c7 68 9f 22 82 48 89 c6 e8 97 b3 df ff <0f> 0b e9 db fe ff ff 48 c7 c6 f8 9f 22 82 48 89 df e8 41 e3 fc ff [ 331.618540] RSP: 0018:ffffc90003c97a88 EFLAGS: 00010086 [ 331.619801] RAX: 0000000000000000 RBX: ffffea0007ff8000 RCX: 0000000000000000 [ 331.621331] RDX: 0000000000000005 RSI: ffffffff8224dce6 RDI: 00000000ffffffff [ 331.622914] RBP: 00000000001ffe00 R08: 0000000000009ffb R09: 00000000ffffdfff [ 331.624712] R10: 00000000ffffdfff R11: ffffffff824660c0 R12: ffff88827fffcd80 [ 331.626317] R13: 0000000000000009 R14: 0000000000000200 R15: 000000000000000a [ 331.627810] FS: 00007f24b3932740(0000) GS:ffff888477c00000(0000) knlGS:0000000000000000 [ 331.630593] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 331.631865] CR2: 0000560a53875018 CR3: 000000017eee8003 CR4: 0000000000370ee0 [ 331.633382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 331.634873] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 331.636324] Call Trace: [ 331.636934] <TASK> [ 331.637521] ? expand+0x1c9/0x200 [ 331.638320] ? __warn+0x7d/0x130 [ 331.639116] ? expand+0x1c9/0x200 [ 331.639957] ? report_bug+0x18d/0x1c0 [ 331.640832] ? handle_bug+0x41/0x70 [ 331.641635] ? exc_invalid_op+0x13/0x60 [ 331.642522] ? asm_exc_invalid_op+0x16/0x20 [ 331.643494] ? expand+0x1c9/0x200 [ 331.644264] ? expand+0x1c9/0x200 [ 331.645007] rmqueue_bulk+0xf4/0x530 [ 331.645847] get_page_from_freelist+0x3ed/0x1040 [ 331.646837] ? prepare_alloc_pages.constprop.0+0x197/0x1b0 [ 331.647977] __alloc_pages+0xec/0x240 [ 331.648783] alloc_buddy_hugetlb_folio.isra.0+0x6a/0x150 [ 331.649912] __alloc_fresh_hugetlb_folio+0x157/0x230 [ 331.650938] alloc_pool_huge_folio+0xad/0x110 [ 331.651909] set_max_huge_pages+0x17d/0x390 [ 331.652760] nr_hugepages_store_common+0x91/0xf0 [ 331.653825] kernfs_fop_write_iter+0x108/0x1f0 [ 331.654986] vfs_write+0x207/0x400 [ 331.655925] ksys_write+0x63/0xe0 [ 331.656832] do_syscall_64+0x37/0x90 [ 331.657793] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 331.660398] RIP: 0033:0x7f24b3a26e87 [ 331.661342] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 331.665673] RSP: 002b:00007ffccd603de8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 331.667541] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f24b3a26e87 [ 331.669197] RDX: 0000000000000005 RSI: 0000560a5381bb50 RDI: 0000000000000001 [ 331.670883] RBP: 0000560a5381bb50 R08: 000000000000000a R09: 00007f24b3abe0c0 [ 331.672536] R10: 00007f24b3abdfc0 R11: 0000000000000246 R12: 0000000000000005 [ 331.674175] R13: 00007f24b3afa520 R14: 0000000000000005 R15: 00007f24b3afa720 [ 331.675841] </TASK> [ 331.676450] ---[ end trace 0000000000000000 ]--- [ 331.677659] ------------[ cut here ]------------ [ 331.677659] ------------[ cut here ]------------ [ 331.679109] page type is 5, passed migratetype is 1 (nr=512) [ 331.680376] WARNING: CPU: 2 PID: 935 at mm/page_alloc.c:699 del_page_from_free_list+0x137/0x170 [ 331.682314] Modules linked in: rfkill ip6table_filter ip6_tables sunrpc snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_seq 9p snd_seq_device netfs 9pnet_virtio snd_pcm joydev snd_timer virtio_balloon snd soundcore 9pnet virtio_blk virtio_console virtio_net net_failover failover crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring fuse [ 331.691852] CPU: 2 PID: 935 Comm: bash Tainted: G W 6.6.0-rc1-next-20230913+ #26 [ 331.694026] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014 [ 331.696162] RIP: 0010:del_page_from_free_list+0x137/0x170 [ 331.697589] Code: c6 05 a0 b5 35 01 01 e8 b7 fb ff ff 44 89 f1 44 89 e2 48 c7 c7 68 9f 22 82 48 89 c6 b8 01 00 00 00 d3 e0 89 c1 e8 69 b7 df ff <0f> 0b e9 03 ff ff ff 48 c7 c6 a0 9f 22 82 48 89 df e8 13 e7 fc ff [ 331.702060] RSP: 0018:ffffc90003c97ac8 EFLAGS: 00010086 [ 331.703430] RAX: 0000000000000000 RBX: ffffea0007ff8000 RCX: 0000000000000000 [ 331.705284] RDX: 0000000000000005 RSI: ffffffff8224dce6 RDI: 00000000ffffffff [ 331.707101] RBP: 00000000001ffe00 R08: 0000000000009ffb R09: 00000000ffffdfff [ 331.708933] R10: 00000000ffffdfff R11: ffffffff824660c0 R12: 0000000000000001 [ 331.710754] R13: ffff88827fffcd80 R14: 0000000000000009 R15: 0000000000000009 [ 331.712637] FS: 00007f24b3932740(0000) GS:ffff888477c00000(0000) knlGS:0000000000000000 [ 331.714861] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 331.716466] CR2: 0000560a53875018 CR3: 000000017eee8003 CR4: 0000000000370ee0 [ 331.718441] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 331.720372] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 331.723583] Call Trace: [ 331.724351] <TASK> [ 331.725045] ? del_page_from_free_list+0x137/0x170 [ 331.726370] ? __warn+0x7d/0x130 [ 331.727326] ? del_page_from_free_list+0x137/0x170 [ 331.728637] ? report_bug+0x18d/0x1c0 [ 331.729688] ? handle_bug+0x41/0x70 [ 331.730707] ? exc_invalid_op+0x13/0x60 [ 331.731798] ? asm_exc_invalid_op+0x16/0x20 [ 331.733007] ? del_page_from_free_list+0x137/0x170 [ 331.734317] ? del_page_from_free_list+0x137/0x170 [ 331.735649] rmqueue_bulk+0xdf/0x530 [ 331.736741] get_page_from_freelist+0x3ed/0x1040 [ 331.738069] ? prepare_alloc_pages.constprop.0+0x197/0x1b0 [ 331.739578] __alloc_pages+0xec/0x240 [ 331.740666] alloc_buddy_hugetlb_folio.isra.0+0x6a/0x150 [ 331.742135] __alloc_fresh_hugetlb_folio+0x157/0x230 [ 331.743521] alloc_pool_huge_folio+0xad/0x110 [ 331.744768] set_max_huge_pages+0x17d/0x390 [ 331.745988] nr_hugepages_store_common+0x91/0xf0 [ 331.747306] kernfs_fop_write_iter+0x108/0x1f0 [ 331.748651] vfs_write+0x207/0x400 [ 331.749735] ksys_write+0x63/0xe0 [ 331.750808] do_syscall_64+0x37/0x90 [ 331.753203] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 331.754857] RIP: 0033:0x7f24b3a26e87 [ 331.756184] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 331.760239] RSP: 002b:00007ffccd603de8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 331.761935] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f24b3a26e87 [ 331.763524] RDX: 0000000000000005 RSI: 0000560a5381bb50 RDI: 0000000000000001 [ 331.765102] RBP: 0000560a5381bb50 R08: 000000000000000a R09: 00007f24b3abe0c0 [ 331.766740] R10: 00007f24b3abdfc0 R11: 0000000000000246 R12: 0000000000000005 [ 331.768344] R13: 00007f24b3afa520 R14: 0000000000000005 R15: 00007f24b3afa720 [ 331.769949] </TASK> [ 331.770559] ---[ end trace 0000000000000000 ]--- -- Mike Kravetz > --- > > From b0cb92ed10b40fab0921002effa8b726df245790 Mon Sep 17 00:00:00 2001 > From: Johannes Weiner <hannes@xxxxxxxxxxx> > Date: Fri, 15 Sep 2023 09:59:52 -0400 > Subject: [PATCH] mm: page_alloc: remove pcppage migratetype caching fix > > Mike reports the following crash in -next: > > [ 28.643019] page:ffffea0004fb4280 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13ed0a > [ 28.645455] flags: 0x200000000000000(node=0|zone=2) > [ 28.646835] page_type: 0xffffffff() > [ 28.647886] raw: 0200000000000000 dead000000000100 dead000000000122 0000000000000000 > [ 28.651170] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 > [ 28.653124] page dumped because: VM_BUG_ON_PAGE(is_migrate_isolate(mt)) > [ 28.654769] ------------[ cut here ]------------ > [ 28.655972] kernel BUG at mm/page_alloc.c:1231! > > This VM_BUG_ON() used to check that the cached pcppage_migratetype set > by free_unref_page() wasn't MIGRATE_ISOLATE. > > When I removed the caching, I erroneously changed the assert to check > that no isolated pages are on the pcplist. This is quite different, > because pages can be isolated *after* they had been put on the > freelist already (which is handled just fine). > > IOW, this was purely a sanity check on the migratetype caching. With > that gone, the check should have been removed as well. Do that now. > > Reported-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> > --- > mm/page_alloc.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index e3f1c777feed..9469e4660b53 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1207,9 +1207,6 @@ static void free_pcppages_bulk(struct zone *zone, int count, > count -= nr_pages; > pcp->count -= nr_pages; > > - /* MIGRATE_ISOLATE page should not go to pcplists */ > - VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); > - > __free_one_page(page, pfn, zone, order, mt, FPI_NONE); > trace_mm_page_pcpu_drain(page, order, mt); > } while (count > 0 && !list_empty(list)); > -- > 2.42.0 >