On 09/18/23 10:52, Johannes Weiner wrote: > On Mon, Sep 18, 2023 at 09:16:58AM +0200, Vlastimil Babka wrote: > > On 9/16/23 21:57, Mike Kravetz wrote: > > > On 09/15/23 10:16, Johannes Weiner wrote: > > >> On Thu, Sep 14, 2023 at 04:52:38PM -0700, Mike Kravetz wrote: > > > > > > With the patch below applied, a slightly different workload triggers the > > > following warnings. It seems related, and appears to go away when > > > reverting the series. > > > > > > [ 331.595382] ------------[ cut here ]------------ > > > [ 331.596665] page type is 5, passed migratetype is 1 (nr=512) > > > [ 331.598121] WARNING: CPU: 2 PID: 935 at mm/page_alloc.c:662 expand+0x1c9/0x200 > > > > Initially I thought this demonstrates the possible race I was suggesting in > > reply to 6/6. But, assuming you have CONFIG_CMA, page type 5 is cma and we > > are trying to get a MOVABLE page from a CMA page block, which is something > > that's normally done and the pageblock stays CMA. So yeah if the warnings > > are to stay, they need to handle this case. Maybe the same can happen with > > HIGHATOMIC blocks? > > Hm I don't think that's quite it. > > CMA and HIGHATOMIC have their own freelists. When MOVABLE requests dip > into CMA and HIGHATOMIC, we explicitly pass that migratetype to > __rmqueue_smallest(). This takes a chunk of e.g. CMA, expands the > remainder to the CMA freelist, then returns the page. While you get a > different mt than requested, the freelist typing should be consistent. > > In this splat, the migratetype passed to __rmqueue_smallest() is > MOVABLE. There is no preceding warning from del_page_from_freelist() > (Mike, correct me if I'm wrong), so we got a confirmed MOVABLE > order-10 block from the MOVABLE list. So far so good. However, when we > expand() the order-9 tail of this block to the MOVABLE list, it warns > that its pageblock type is CMA. > > This means we have an order-10 page where one half is MOVABLE and the > other is CMA. > > I don't see how the merging code in __free_one_page() could have done > that. The CMA buddy would have failed the migrate_is_mergeable() test > and we should have left it at order-9s. > > I also don't see how the CMA setup could have done this because > MIGRATE_CMA is set on the range before the pages are fed to the buddy. > > Mike, could you describe the workload that is triggering this? This 'slightly different workload' is actually a slightly different environment. Sorry for mis-speaking! The slight difference is that this environment does not use the 'alloc hugetlb gigantic pages from CMA' (hugetlb_cma) feature that triggered the previous issue. This is still on a 16G VM. Kernel command line here is: "BOOT_IMAGE=(hd0,msdos1)/vmlinuz-6.6.0-rc1-next-20230913+ root=UUID=49c13301-2555-44dc-847b-caabe1d62bdf ro console=tty0 console=ttyS0,115200 audit=0 selinux=0 transparent_hugepage=always hugetlb_free_vmemmap=on" The workload is just running this script: while true; do echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/demote echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages done > > Does this reproduce instantly and reliably? > It is not 'instant' but will reproduce fairly reliably within a minute or so. Note that the 'echo 4 > .../hugepages-1048576kB/nr_hugepages' is going to end up calling alloc_contig_pages -> alloc_contig_range. Those pages will eventually be freed via __free_pages(folio, 9). > Is there high load on the system, or is it requesting the huge page > with not much else going on? Only the script was running. > Do you see compact_* history in /proc/vmstat after this triggers? As one might expect, compact_isolated continually increases during this this run. > Could you please also provide /proc/zoneinfo, /proc/pagetypeinfo and > the hugetlb_cma= parameter you're using? As mentioned above, hugetlb_cma is not used in this environment. Strangely enough, this does not reproduce (easily at least) if I use hugetlb_cma as in the previous report. The following are during a run after WARNING is triggered. # cat /proc/zoneinfo Node 0, zone DMA per-node stats nr_inactive_anon 11800 nr_active_anon 109 nr_inactive_file 38161 nr_active_file 10007 nr_unevictable 12 nr_slab_reclaimable 2766 nr_slab_unreclaimable 6881 nr_isolated_anon 0 nr_isolated_file 0 workingset_nodes 0 workingset_refault_anon 0 workingset_refault_file 0 workingset_activate_anon 0 workingset_activate_file 0 workingset_restore_anon 0 workingset_restore_file 0 workingset_nodereclaim 0 nr_anon_pages 11750 nr_mapped 18402 nr_file_pages 48339 nr_dirty 0 nr_writeback 0 nr_writeback_temp 0 nr_shmem 166 nr_shmem_hugepages 0 nr_shmem_pmdmapped 0 nr_file_hugepages 0 nr_file_pmdmapped 0 nr_anon_transparent_hugepages 6 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_dirtied 14766 nr_written 7701 nr_throttled_written 0 nr_kernel_misc_reclaimable 0 nr_foll_pin_acquired 96 nr_foll_pin_released 96 nr_kernel_stack 1816 nr_page_table_pages 1100 nr_sec_page_table_pages 0 nr_swapcached 0 pages free 3840 boost 0 min 21 low 26 high 31 spanned 4095 present 3998 managed 3840 cma 0 protection: (0, 1908, 7923, 7923) nr_free_pages 3840 nr_zone_inactive_anon 0 nr_zone_active_anon 0 nr_zone_inactive_file 0 nr_zone_active_file 0 nr_zone_unevictable 0 nr_zone_write_pending 0 nr_mlock 0 nr_bounce 0 nr_zspages 0 nr_free_cma 0 numa_hit 0 numa_miss 0 numa_foreign 0 numa_interleave 0 numa_local 0 numa_other 0 pagesets cpu: 0 count: 0 high: 13 batch: 1 vm stats threshold: 6 cpu: 1 count: 0 high: 13 batch: 1 vm stats threshold: 6 cpu: 2 count: 0 high: 13 batch: 1 vm stats threshold: 6 cpu: 3 count: 0 high: 13 batch: 1 vm stats threshold: 6 node_unreclaimable: 0 start_pfn: 1 Node 0, zone DMA32 pages free 495317 boost 0 min 2687 low 3358 high 4029 spanned 1044480 present 520156 managed 496486 cma 0 protection: (0, 0, 6015, 6015) nr_free_pages 495317 nr_zone_inactive_anon 0 nr_zone_active_anon 0 nr_zone_inactive_file 0 nr_zone_active_file 0 nr_zone_unevictable 0 nr_zone_write_pending 0 nr_mlock 0 nr_bounce 0 nr_zspages 0 nr_free_cma 0 numa_hit 0 numa_miss 0 numa_foreign 0 numa_interleave 0 numa_local 0 numa_other 0 pagesets cpu: 0 count: 913 high: 1679 batch: 63 vm stats threshold: 30 cpu: 1 count: 0 high: 1679 batch: 63 vm stats threshold: 30 cpu: 2 count: 0 high: 1679 batch: 63 vm stats threshold: 30 cpu: 3 count: 256 high: 1679 batch: 63 vm stats threshold: 30 node_unreclaimable: 0 start_pfn: 4096 Node 0, zone Normal pages free 1360836 boost 0 min 8473 low 10591 high 12709 spanned 1572864 present 1572864 managed 1552266 cma 0 protection: (0, 0, 0, 0) nr_free_pages 1360836 nr_zone_inactive_anon 11800 nr_zone_active_anon 109 nr_zone_inactive_file 38161 nr_zone_active_file 10007 nr_zone_unevictable 12 nr_zone_write_pending 0 nr_mlock 12 nr_bounce 0 nr_zspages 3 nr_free_cma 0 numa_hit 10623572 numa_miss 0 numa_foreign 0 numa_interleave 1357 numa_local 6902986 numa_other 3720586 pagesets cpu: 0 count: 156 high: 5295 batch: 63 vm stats threshold: 42 cpu: 1 count: 210 high: 5295 batch: 63 vm stats threshold: 42 cpu: 2 count: 4956 high: 5295 batch: 63 vm stats threshold: 42 cpu: 3 count: 1 high: 5295 batch: 63 vm stats threshold: 42 node_unreclaimable: 0 start_pfn: 1048576 Node 0, zone Movable pages free 0 boost 0 min 32 low 32 high 32 spanned 0 present 0 managed 0 cma 0 protection: (0, 0, 0, 0) Node 1, zone DMA pages free 0 boost 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 cma 0 protection: (0, 0, 0, 0) Node 1, zone DMA32 pages free 0 boost 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 cma 0 protection: (0, 0, 0, 0) Node 1, zone Normal per-node stats nr_inactive_anon 15381 nr_active_anon 81 nr_inactive_file 66550 nr_active_file 25965 nr_unevictable 421 nr_slab_reclaimable 4069 nr_slab_unreclaimable 7836 nr_isolated_anon 0 nr_isolated_file 0 workingset_nodes 0 workingset_refault_anon 0 workingset_refault_file 0 workingset_activate_anon 0 workingset_activate_file 0 workingset_restore_anon 0 workingset_restore_file 0 workingset_nodereclaim 0 nr_anon_pages 15420 nr_mapped 24331 nr_file_pages 92978 nr_dirty 0 nr_writeback 0 nr_writeback_temp 0 nr_shmem 100 nr_shmem_hugepages 0 nr_shmem_pmdmapped 0 nr_file_hugepages 0 nr_file_pmdmapped 0 nr_anon_transparent_hugepages 11 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_dirtied 6217 nr_written 2902 nr_throttled_written 0 nr_kernel_misc_reclaimable 0 nr_foll_pin_acquired 0 nr_foll_pin_released 0 nr_kernel_stack 1656 nr_page_table_pages 756 nr_sec_page_table_pages 0 nr_swapcached 0 pages free 1829073 boost 0 min 11345 low 14181 high 17017 spanned 2097152 present 2097152 managed 2086594 cma 0 protection: (0, 0, 0, 0) nr_free_pages 1829073 nr_zone_inactive_anon 15381 nr_zone_active_anon 81 nr_zone_inactive_file 66550 nr_zone_active_file 25965 nr_zone_unevictable 421 nr_zone_write_pending 0 nr_mlock 421 nr_bounce 0 nr_zspages 0 nr_free_cma 0 numa_hit 10522401 numa_miss 0 numa_foreign 0 numa_interleave 961 numa_local 4057399 numa_other 6465002 pagesets cpu: 0 count: 0 high: 7090 batch: 63 vm stats threshold: 42 cpu: 1 count: 17 high: 7090 batch: 63 vm stats threshold: 42 cpu: 2 count: 6997 high: 7090 batch: 63 vm stats threshold: 42 cpu: 3 count: 0 high: 7090 batch: 63 vm stats threshold: 42 node_unreclaimable: 0 start_pfn: 2621440 Node 1, zone Movable pages free 0 boost 0 min 32 low 32 high 32 spanned 0 present 0 managed 0 cma 0 protection: (0, 0, 0, 0) # cat /proc/pagetypeinfo Page block order: 9 Pages per block: 512 Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 Node 0, zone DMA, type Unmovable 0 0 0 0 0 0 0 0 1 0 0 Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3 Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Unmovable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Movable 1 0 1 2 2 3 3 3 4 4 480 Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Unmovable 566 14 22 7 8 8 9 4 7 0 1 Node 0, zone Normal, type Movable 214 299 120 53 15 10 6 6 1 4 1159 Node 0, zone Normal, type Reclaimable 0 9 18 11 6 1 0 0 0 0 0 Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate Node 0, zone DMA 1 7 0 0 0 0 Node 0, zone DMA32 0 1016 0 0 0 0 Node 0, zone Normal 71 2995 6 0 0 0 Page block order: 9 Pages per block: 512 Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 Node 1, zone Normal, type Unmovable 459 12 5 6 6 5 5 5 6 2 1 Node 1, zone Normal, type Movable 1287 502 171 85 34 14 13 8 2 5 1861 Node 1, zone Normal, type Reclaimable 1 5 12 6 9 3 1 1 0 1 0 Node 1, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Node 1, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0 Node 1, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 3 Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate Node 1, zone Normal 101 3977 10 0 0 8 -- Mike Kravetz