On Mon, Sep 18, 2023 at 10:40:37AM -0700, Mike Kravetz wrote: > On 09/18/23 10:52, Johannes Weiner wrote: > > On Mon, Sep 18, 2023 at 09:16:58AM +0200, Vlastimil Babka wrote: > > > On 9/16/23 21:57, Mike Kravetz wrote: > > > > On 09/15/23 10:16, Johannes Weiner wrote: > > > >> On Thu, Sep 14, 2023 at 04:52:38PM -0700, Mike Kravetz wrote: > > > > > > > > With the patch below applied, a slightly different workload triggers the > > > > following warnings. It seems related, and appears to go away when > > > > reverting the series. > > > > > > > > [ 331.595382] ------------[ cut here ]------------ > > > > [ 331.596665] page type is 5, passed migratetype is 1 (nr=512) > > > > [ 331.598121] WARNING: CPU: 2 PID: 935 at mm/page_alloc.c:662 expand+0x1c9/0x200 > > > > > > Initially I thought this demonstrates the possible race I was suggesting in > > > reply to 6/6. But, assuming you have CONFIG_CMA, page type 5 is cma and we > > > are trying to get a MOVABLE page from a CMA page block, which is something > > > that's normally done and the pageblock stays CMA. So yeah if the warnings > > > are to stay, they need to handle this case. Maybe the same can happen with > > > HIGHATOMIC blocks? Ok, the CMA thing gave me pause because Mike's pagetypeinfo didn't show any CMA pages. 5 is actually MIGRATE_ISOLATE - see the double use of 3 for PCPTYPES and HIGHATOMIC. > > This means we have an order-10 page where one half is MOVABLE and the > > other is CMA. This means the scenario is different: We get a MAX_ORDER page off the MOVABLE freelist. The removal checks that the first pageblock is indeed MOVABLE. During the expand, the second pageblock turns out to be of type MIGRATE_ISOLATE. The page allocator wouldn't have merged those types. It triggers a bit too fast to be a race condition. It appears that MIGRATE_ISOLATE is simply set on the tail pageblock while the head is on the list, and then stranded there. Could this be an issue in the page_isolation code? Maybe a range rounding error? Zi Yan, does this ring a bell for you? I don't quite see how my patches could have caused this. But AFAICS we also didn't have warnings for this scenario so it could be an old bug. > > Mike, could you describe the workload that is triggering this? > > This 'slightly different workload' is actually a slightly different > environment. Sorry for mis-speaking! The slight difference is that this > environment does not use the 'alloc hugetlb gigantic pages from CMA' > (hugetlb_cma) feature that triggered the previous issue. > > This is still on a 16G VM. Kernel command line here is: > "BOOT_IMAGE=(hd0,msdos1)/vmlinuz-6.6.0-rc1-next-20230913+ > root=UUID=49c13301-2555-44dc-847b-caabe1d62bdf ro console=tty0 > console=ttyS0,115200 audit=0 selinux=0 transparent_hugepage=always > hugetlb_free_vmemmap=on" > > The workload is just running this script: > while true; do > echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/demote > echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages > done > > > > > Does this reproduce instantly and reliably? > > > > It is not 'instant' but will reproduce fairly reliably within a minute > or so. > > Note that the 'echo 4 > .../hugepages-1048576kB/nr_hugepages' is going > to end up calling alloc_contig_pages -> alloc_contig_range. Those pages > will eventually be freed via __free_pages(folio, 9). No luck reproducing this yet, but I have a question. In that crash stack trace, the expand() is called via this: [ 331.645847] get_page_from_freelist+0x3ed/0x1040 [ 331.646837] ? prepare_alloc_pages.constprop.0+0x197/0x1b0 [ 331.647977] __alloc_pages+0xec/0x240 [ 331.648783] alloc_buddy_hugetlb_folio.isra.0+0x6a/0x150 [ 331.649912] __alloc_fresh_hugetlb_folio+0x157/0x230 [ 331.650938] alloc_pool_huge_folio+0xad/0x110 [ 331.651909] set_max_huge_pages+0x17d/0x390 I don't see an __alloc_fresh_hugetlb_folio() in my tree. Only alloc_fresh_hugetlb_folio(), which has this: if (hstate_is_gigantic(h)) folio = alloc_gigantic_folio(h, gfp_mask, nid, nmask); else folio = alloc_buddy_hugetlb_folio(h, gfp_mask, nid, nmask, node_alloc_noretry); where gigantic is defined as the order exceeding MAX_ORDER, which should be the case for 1G pages on x86. So the crashing stack must be from a 2M allocation, no? I'm confused how that could happen with the above test case.