Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> writes: > David Hildenbrand <david@xxxxxxxxxx> writes: > >> On 11.02.22 10:16, Aneesh Kumar K V wrote: >>> On 2/11/22 14:00, David Hildenbrand wrote: >>>> On 11.02.22 07:52, Aneesh Kumar K.V wrote: >>>>> commit: d9c234005227 ("Do not depend on MAX_ORDER when grouping pages by mobility") .... .... > I could build a kernel with FORCE_MAX_ZONEORDER=8 and pageblock_order = > 8. We need to disable THP for such a kernel to boot, because THP do > check for PMD_ORDER < MAX_ORDER. I was able to boot that kernel on a > virtualized platform, but then gigantic_page_runtime_supported is not > supported on such config with hash translation. > > On non virtualized platform I am hitting crashes like below during boot. > > [ 47.637865][ C42] ============================================================================= > [ 47.637907][ C42] BUG pgtable-2^11 (Not tainted): Object already free > [ 47.637925][ C42] ----------------------------------------------------------------------------- > [ 47.637925][ C42] > [ 47.637945][ C42] Allocated in __pud_alloc+0x84/0x2a0 age=278 cpu=40 pid=1409 > [ 47.637974][ C42] __slab_alloc.isra.0+0x40/0x60 > [ 47.637995][ C42] kmem_cache_alloc+0x1a8/0x510 > [ 47.638010][ C42] __pud_alloc+0x84/0x2a0 > [ 47.638024][ C42] copy_page_range+0x38c/0x1b90 > [ 47.638040][ C42] dup_mm+0x548/0x880 > [ 47.638058][ C42] copy_process+0xdc0/0x1e90 > [ 47.638076][ C42] kernel_clone+0xd4/0x9d0 > [ 47.638094][ C42] __do_sys_clone+0x88/0xe0 > [ 47.638112][ C42] system_call_exception+0x368/0x3a0 > [ 47.638128][ C42] system_call_common+0xec/0x250 > [ 47.638147][ C42] Freed in __tlb_remove_table+0x1d4/0x200 age=263 cpu=57 pid=326 > [ 47.638172][ C42] kmem_cache_free+0x44c/0x680 > [ 47.638187][ C42] __tlb_remove_table+0x1d4/0x200 > [ 47.638204][ C42] tlb_remove_table_rcu+0x54/0xa0 > [ 47.638222][ C42] rcu_core+0xdd4/0x15d0 > [ 47.638239][ C42] __do_softirq+0x360/0x69c > [ 47.638257][ C42] run_ksoftirqd+0x54/0xc0 > [ 47.638273][ C42] smpboot_thread_fn+0x28c/0x2f0 > [ 47.638290][ C42] kthread+0x1a4/0x1b0 > [ 47.638305][ C42] ret_from_kernel_thread+0x5c/0x64 > [ 47.638320][ C42] Slab 0xc00c00000000d600 objects=10 used=9 fp=0xc0000000035a8000 flags=0x7ffff000010201(locked|slab|head|node=0|zone=0|lastcpupid=0x7ffff) > [ 47.638352][ C42] Object 0xc0000000035a8000 @offset=163840 fp=0x0000000000000000 > [ 47.638352][ C42] > [ 47.638373][ C42] Redzone c0000000035a4000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ > [ 47.638394][ C42] Redzone c0000000035a4010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ > [ 47.638414][ C42] Redzone c0000000035a4020: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ > [ 47.638435][ C42] Redzone c0000000035a4030: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ > [ 47.638455][ C42] Redzone c0000000035a4040: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ > [ 47.638474][ C42] Redzone c0000000035a4050: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ > [ 47.638494][ C42] Redzone c0000000035a4060: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ > [ 47.638514][ C42] Redzone c0000000035a4070: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ > [ 47.638534][ C42] Redzone c0000000035a4080: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ Ok that turned out to be unrelated. I was using a wrong kernel. I can boot kernel with pageblock_order > MAX_ORDER and run hugetlb related test fine. I do get the below warning which you had already called out in your patch. [ 3.952124] WARNING: CPU: 16 PID: 719 at mm/vmstat.c:1103 __fragmentation_index+0x14/0x70 [ 3.952136] Modules linked in: [ 3.952141] CPU: 16 PID: 719 Comm: kswapd0 Tainted: G B 5.17.0-rc3-00044-g69052ffa0e08 #68 [ 3.952149] NIP: c000000000465264 LR: c000000000468544 CTR: 0000000000000000 [ 3.952154] REGS: c000000014a4f7e0 TRAP: 0700 Tainted: G B (5.17.0-rc3-00044-g69052ffa0e08) [ 3.952161] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44042422 XER: 20000000 [ 3.952174] CFAR: c000000000468540 IRQMASK: 0 GPR00: c000000000468544 c000000014a4fa80 c000000001ea9500 0000000000000008 GPR04: c000000014a4faa0 00000000001fd700 0000000000004003 00000000001fd92d GPR08: c000001fffd1c7a0 0000000000000008 0000000000000008 0000000000000000 GPR12: 0000000000002200 c000001fffff2880 0000000000000000 c000000013cfd240 GPR16: c000000011940600 c000001fffd21058 0000000000000d00 c000000001407d30 GPR20: ffffffffffffffaf c000001fffd21098 0000000000000000 c000000002ab7328 GPR24: c000000011940600 c000001fffd21300 0000000000000000 0000000000000008 GPR28: c000001fffd1c280 0000000000000008 0000000000000000 0000000000000004 [ 3.952231] NIP [c000000000465264] __fragmentation_index+0x14/0x70 [ 3.952237] LR [c000000000468544] fragmentation_index+0xb4/0xe0 [ 3.952244] Call Trace: [ 3.952247] [c000000014a4fa80] [c00000000023e248] lock_release+0x138/0x470 (unreliable) [ 3.952256] [c000000014a4fac0] [c00000000047cd84] compaction_suitable+0x94/0x270 [ 3.952263] [c000000014a4fb10] [c0000000004802b8] wakeup_kcompactd+0xc8/0x2a0 [ 3.952270] [c000000014a4fb60] [c000000000457568] balance_pgdat+0x798/0x8d0 [ 3.952277] [c000000014a4fca0] [c000000000457d14] kswapd+0x674/0x7b0 [ 3.952283] [c000000014a4fdc0] [c0000000001d7e84] kthread+0x144/0x150 [ 3.952290] [c000000014a4fe10] [c00000000000cd74] ret_from_kernel_thread+0x5c/0x64 [ 3.952297] Instruction dump: [ 3.952301] 7d2021ad 40c2fff4 e8ed0030 38a00000 7caa39ae 4e800020 60000000 7c0802a6 [ 3.952311] 60000000 28030007 7c6a1b78 40810010 <0fe00000> 60000000 60000000 e9040008 [ 3.952322] irq event stamp: 0 [ 3.952325] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 3.952331] hardirqs last disabled at (0): [<c000000000196030>] copy_process+0x970/0x1de0 [ 3.952339] softirqs last enabled at (0): [<c000000000196030>] copy_process+0x970/0x1de0 [ 3.952345] softirqs last disabled at (0): [<0000000000000000>] 0x0 I am not sure whether there is any value in selecting MAX_ORDER = 8 on ppc64. If not we could do a patch as below for ppc64. commit 09ed79c4fda92418914546f36c2750670503d7a0 Author: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> Date: Fri Feb 11 17:15:10 2022 +0530 powerpc/mm: Disable MAX_ORDER value 8 on book3s64 with 64K pagesize With transparent hugepage support we expect HPAGE_PMD_ORDER < MAX_ORDER. Without this we BUG() during boot as below cpu 0x6: Vector: 700 (Program Check) at [c000000012143880] pc: c000000001b4ddbc: hugepage_init+0x108/0x2c4 lr: c000000001b4dd98: hugepage_init+0xe4/0x2c4 sp: c000000012143b20 msr: 8000000002029033 current = 0xc0000000120d0f80 paca = 0xc00000001ec7e900 irqmask: 0x03 irq_happened: 0x01 pid = 1, comm = swapper/0 kernel BUG at mm/huge_memory.c:413! [c000000012143b20] c0000000022c0468 blacklisted_initcalls+0x120/0x1c8 (unreliable) [c000000012143bb0] c000000000012104 do_one_initcall+0x94/0x520 [c000000012143c90] c000000001b04da0 kernel_init_freeable+0x444/0x508 [c000000012143da0] c000000000012d8c kernel_init+0x44/0x188 [c000000012143e10] c00000000000cbf4 ret_from_kernel_thread+0x5c/0x64 Hence a FORCE_MAX_ZONEORDER of value < 9 doesn't make sense with THP enabled. We also cannot have value > 9 because we are limitted by SECTION_SIZE_BITS #if (MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS #error Allocator MAX_ORDER exceeds SECTION_SIZE #endif We can select MAX_ORDER value 8 by disabling THP support but then that results in pageblock_order > MAX_ORDER - 1 which is not fully tested/supported. Cc: David Hildenbrand <david@xxxxxxxxxx> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index b779603978e1..a050f5f46df3 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -807,7 +807,7 @@ config DATA_SHIFT config FORCE_MAX_ZONEORDER int "Maximum zone order" - range 8 9 if PPC64 && PPC_64K_PAGES + range 9 9 if PPC64 && PPC_64K_PAGES default "9" if PPC64 && PPC_64K_PAGES range 13 13 if PPC64 && !PPC_64K_PAGES default "13" if PPC64 && !PPC_64K_PAGES