Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/15/23 10:16, Johannes Weiner wrote:
> On Thu, Sep 14, 2023 at 04:52:38PM -0700, Mike Kravetz wrote:
> > In next-20230913, I started hitting the following BUG.  Seems related
> > to this series.  And, if series is reverted I do not see the BUG.
> > 
> > I can easily reproduce on a small 16G VM.  kernel command line contains
> > "hugetlb_free_vmemmap=on hugetlb_cma=4G".  Then run the script,
> > while true; do
> >  echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> >  echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/demote
> >  echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> > done
> > 
> > For the BUG below I believe it was the first (or second) 1G page creation from
> > CMA that triggered:  cma_alloc of 1G.
> > 
> > Sorry, have not looked deeper into the issue.
> 
> Thanks for the report, and sorry about the breakage!
> 
> I was scratching my head at this:
> 
>                         /* MIGRATE_ISOLATE page should not go to pcplists */
>                         VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
> 
> because there is nothing in page isolation that prevents setting
> MIGRATE_ISOLATE on something that's on the pcplist already. So why
> didn't this trigger before already?
> 
> Then it clicked: it used to only check the *pcpmigratetype* determined
> by free_unref_page(), which of course mustn't be MIGRATE_ISOLATE.
> 
> Pages that get isolated while *already* on the pcplist are fine, and
> are handled properly:
> 
>                         mt = get_pcppage_migratetype(page);
> 
>                         /* MIGRATE_ISOLATE page should not go to pcplists */
>                         VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
> 
>                         /* Pageblock could have been isolated meanwhile */
>                         if (unlikely(isolated_pageblocks))
>                                 mt = get_pageblock_migratetype(page);
> 
> So this was purely a sanity check against the pcpmigratetype cache
> operations. With that gone, we can remove it.

With the patch below applied, a slightly different workload triggers the
following warnings.  It seems related, and appears to go away when
reverting the series.

[  331.595382] ------------[ cut here ]------------
[  331.596665] page type is 5, passed migratetype is 1 (nr=512)
[  331.598121] WARNING: CPU: 2 PID: 935 at mm/page_alloc.c:662 expand+0x1c9/0x200
[  331.600549] Modules linked in: rfkill ip6table_filter ip6_tables sunrpc snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_seq 9p snd_seq_device netfs 9pnet_virtio snd_pcm joydev snd_timer virtio_balloon snd soundcore 9pnet virtio_blk virtio_console virtio_net net_failover failover crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring fuse
[  331.609530] CPU: 2 PID: 935 Comm: bash Tainted: G        W          6.6.0-rc1-next-20230913+ #26
[  331.611603] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
[  331.613527] RIP: 0010:expand+0x1c9/0x200
[  331.614492] Code: 89 ef be 07 00 00 00 c6 05 c9 b1 35 01 01 e8 de f7 ff ff 8b 4c 24 30 8b 54 24 0c 48 c7 c7 68 9f 22 82 48 89 c6 e8 97 b3 df ff <0f> 0b e9 db fe ff ff 48 c7 c6 f8 9f 22 82 48 89 df e8 41 e3 fc ff
[  331.618540] RSP: 0018:ffffc90003c97a88 EFLAGS: 00010086
[  331.619801] RAX: 0000000000000000 RBX: ffffea0007ff8000 RCX: 0000000000000000
[  331.621331] RDX: 0000000000000005 RSI: ffffffff8224dce6 RDI: 00000000ffffffff
[  331.622914] RBP: 00000000001ffe00 R08: 0000000000009ffb R09: 00000000ffffdfff
[  331.624712] R10: 00000000ffffdfff R11: ffffffff824660c0 R12: ffff88827fffcd80
[  331.626317] R13: 0000000000000009 R14: 0000000000000200 R15: 000000000000000a
[  331.627810] FS:  00007f24b3932740(0000) GS:ffff888477c00000(0000) knlGS:0000000000000000
[  331.630593] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  331.631865] CR2: 0000560a53875018 CR3: 000000017eee8003 CR4: 0000000000370ee0
[  331.633382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  331.634873] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  331.636324] Call Trace:
[  331.636934]  <TASK>
[  331.637521]  ? expand+0x1c9/0x200
[  331.638320]  ? __warn+0x7d/0x130
[  331.639116]  ? expand+0x1c9/0x200
[  331.639957]  ? report_bug+0x18d/0x1c0
[  331.640832]  ? handle_bug+0x41/0x70
[  331.641635]  ? exc_invalid_op+0x13/0x60
[  331.642522]  ? asm_exc_invalid_op+0x16/0x20
[  331.643494]  ? expand+0x1c9/0x200
[  331.644264]  ? expand+0x1c9/0x200
[  331.645007]  rmqueue_bulk+0xf4/0x530
[  331.645847]  get_page_from_freelist+0x3ed/0x1040
[  331.646837]  ? prepare_alloc_pages.constprop.0+0x197/0x1b0
[  331.647977]  __alloc_pages+0xec/0x240
[  331.648783]  alloc_buddy_hugetlb_folio.isra.0+0x6a/0x150
[  331.649912]  __alloc_fresh_hugetlb_folio+0x157/0x230
[  331.650938]  alloc_pool_huge_folio+0xad/0x110
[  331.651909]  set_max_huge_pages+0x17d/0x390
[  331.652760]  nr_hugepages_store_common+0x91/0xf0
[  331.653825]  kernfs_fop_write_iter+0x108/0x1f0
[  331.654986]  vfs_write+0x207/0x400
[  331.655925]  ksys_write+0x63/0xe0
[  331.656832]  do_syscall_64+0x37/0x90
[  331.657793]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  331.660398] RIP: 0033:0x7f24b3a26e87
[  331.661342] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  331.665673] RSP: 002b:00007ffccd603de8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  331.667541] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f24b3a26e87
[  331.669197] RDX: 0000000000000005 RSI: 0000560a5381bb50 RDI: 0000000000000001
[  331.670883] RBP: 0000560a5381bb50 R08: 000000000000000a R09: 00007f24b3abe0c0
[  331.672536] R10: 00007f24b3abdfc0 R11: 0000000000000246 R12: 0000000000000005
[  331.674175] R13: 00007f24b3afa520 R14: 0000000000000005 R15: 00007f24b3afa720
[  331.675841]  </TASK>
[  331.676450] ---[ end trace 0000000000000000 ]---
[  331.677659] ------------[ cut here ]------------


[  331.677659] ------------[ cut here ]------------
[  331.679109] page type is 5, passed migratetype is 1 (nr=512)
[  331.680376] WARNING: CPU: 2 PID: 935 at mm/page_alloc.c:699 del_page_from_free_list+0x137/0x170
[  331.682314] Modules linked in: rfkill ip6table_filter ip6_tables sunrpc snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_seq 9p snd_seq_device netfs 9pnet_virtio snd_pcm joydev snd_timer virtio_balloon snd soundcore 9pnet virtio_blk virtio_console virtio_net net_failover failover crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring fuse
[  331.691852] CPU: 2 PID: 935 Comm: bash Tainted: G        W          6.6.0-rc1-next-20230913+ #26
[  331.694026] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
[  331.696162] RIP: 0010:del_page_from_free_list+0x137/0x170
[  331.697589] Code: c6 05 a0 b5 35 01 01 e8 b7 fb ff ff 44 89 f1 44 89 e2 48 c7 c7 68 9f 22 82 48 89 c6 b8 01 00 00 00 d3 e0 89 c1 e8 69 b7 df ff <0f> 0b e9 03 ff ff ff 48 c7 c6 a0 9f 22 82 48 89 df e8 13 e7 fc ff
[  331.702060] RSP: 0018:ffffc90003c97ac8 EFLAGS: 00010086
[  331.703430] RAX: 0000000000000000 RBX: ffffea0007ff8000 RCX: 0000000000000000
[  331.705284] RDX: 0000000000000005 RSI: ffffffff8224dce6 RDI: 00000000ffffffff
[  331.707101] RBP: 00000000001ffe00 R08: 0000000000009ffb R09: 00000000ffffdfff
[  331.708933] R10: 00000000ffffdfff R11: ffffffff824660c0 R12: 0000000000000001
[  331.710754] R13: ffff88827fffcd80 R14: 0000000000000009 R15: 0000000000000009
[  331.712637] FS:  00007f24b3932740(0000) GS:ffff888477c00000(0000) knlGS:0000000000000000
[  331.714861] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  331.716466] CR2: 0000560a53875018 CR3: 000000017eee8003 CR4: 0000000000370ee0
[  331.718441] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  331.720372] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  331.723583] Call Trace:
[  331.724351]  <TASK>
[  331.725045]  ? del_page_from_free_list+0x137/0x170
[  331.726370]  ? __warn+0x7d/0x130
[  331.727326]  ? del_page_from_free_list+0x137/0x170
[  331.728637]  ? report_bug+0x18d/0x1c0
[  331.729688]  ? handle_bug+0x41/0x70
[  331.730707]  ? exc_invalid_op+0x13/0x60
[  331.731798]  ? asm_exc_invalid_op+0x16/0x20
[  331.733007]  ? del_page_from_free_list+0x137/0x170
[  331.734317]  ? del_page_from_free_list+0x137/0x170
[  331.735649]  rmqueue_bulk+0xdf/0x530
[  331.736741]  get_page_from_freelist+0x3ed/0x1040
[  331.738069]  ? prepare_alloc_pages.constprop.0+0x197/0x1b0
[  331.739578]  __alloc_pages+0xec/0x240
[  331.740666]  alloc_buddy_hugetlb_folio.isra.0+0x6a/0x150
[  331.742135]  __alloc_fresh_hugetlb_folio+0x157/0x230
[  331.743521]  alloc_pool_huge_folio+0xad/0x110
[  331.744768]  set_max_huge_pages+0x17d/0x390
[  331.745988]  nr_hugepages_store_common+0x91/0xf0
[  331.747306]  kernfs_fop_write_iter+0x108/0x1f0
[  331.748651]  vfs_write+0x207/0x400
[  331.749735]  ksys_write+0x63/0xe0
[  331.750808]  do_syscall_64+0x37/0x90
[  331.753203]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  331.754857] RIP: 0033:0x7f24b3a26e87
[  331.756184] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  331.760239] RSP: 002b:00007ffccd603de8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  331.761935] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f24b3a26e87
[  331.763524] RDX: 0000000000000005 RSI: 0000560a5381bb50 RDI: 0000000000000001
[  331.765102] RBP: 0000560a5381bb50 R08: 000000000000000a R09: 00007f24b3abe0c0
[  331.766740] R10: 00007f24b3abdfc0 R11: 0000000000000246 R12: 0000000000000005
[  331.768344] R13: 00007f24b3afa520 R14: 0000000000000005 R15: 00007f24b3afa720
[  331.769949]  </TASK>
[  331.770559] ---[ end trace 0000000000000000 ]---

-- 
Mike Kravetz

> ---
> 
> From b0cb92ed10b40fab0921002effa8b726df245790 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@xxxxxxxxxxx>
> Date: Fri, 15 Sep 2023 09:59:52 -0400
> Subject: [PATCH] mm: page_alloc: remove pcppage migratetype caching fix
> 
> Mike reports the following crash in -next:
> 
> [   28.643019] page:ffffea0004fb4280 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13ed0a
> [   28.645455] flags: 0x200000000000000(node=0|zone=2)
> [   28.646835] page_type: 0xffffffff()
> [   28.647886] raw: 0200000000000000 dead000000000100 dead000000000122 0000000000000000
> [   28.651170] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> [   28.653124] page dumped because: VM_BUG_ON_PAGE(is_migrate_isolate(mt))
> [   28.654769] ------------[ cut here ]------------
> [   28.655972] kernel BUG at mm/page_alloc.c:1231!
> 
> This VM_BUG_ON() used to check that the cached pcppage_migratetype set
> by free_unref_page() wasn't MIGRATE_ISOLATE.
> 
> When I removed the caching, I erroneously changed the assert to check
> that no isolated pages are on the pcplist. This is quite different,
> because pages can be isolated *after* they had been put on the
> freelist already (which is handled just fine).
> 
> IOW, this was purely a sanity check on the migratetype caching. With
> that gone, the check should have been removed as well. Do that now.
> 
> Reported-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> ---
>  mm/page_alloc.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e3f1c777feed..9469e4660b53 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1207,9 +1207,6 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>  			count -= nr_pages;
>  			pcp->count -= nr_pages;
>  
> -			/* MIGRATE_ISOLATE page should not go to pcplists */
> -			VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
> -
>  			__free_one_page(page, pfn, zone, order, mt, FPI_NONE);
>  			trace_mm_page_pcpu_drain(page, order, mt);
>  		} while (count > 0 && !list_empty(list));
> -- 
> 2.42.0
> 




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux