From: "Hailong.Liu" <hailong.liu@xxxxxxxx> The scenario where the issue occurs is as follows: CONFIG: vmap_allow_huge = true && 2M is for PMD_SIZE kvmalloc(2M, __GFP_NOFAIL|GFP_XXX) __vmalloc_node_range(vm_flags=VM_ALLOW_HUGE_VMAP) vm_area_alloc_pages(order=9) --->allocs order9 failed and fallback to order0 and phys_addr is aligned with PMD_SIZE vmap_pages_range vmap_pages_range_noflush __vmap_pages_range_noflush(page_shift = 21) ----> incorrect vmap *huge* here In fact, as long as page_shift is not equal to PAGE_SHIFT, there might be issues with the __vmap_pages_range_noflush(). The patch also remove VM_ALLOW_HUGE_VMAP in kvmalloc_node(), There are several reasons for this: - This increases memory footprint because ALIGNMENT. - This increases the likelihood of kvmalloc allocation failures. - Without this it fixes the origin issue of kvmalloc with __GFP_NOFAIL may return NULL. Besides if drivers want to vmap huge, user vmalloc_huge instead. Fix it by disabling fallback and remove VM_ALLOW_HUGE_VMAP in kvmalloc_node(). Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations") CC: Barry Song <21cnbao@xxxxxxxxx> CC: Baoquan He <bhe@xxxxxxxxxx> CC: Matthew Wilcox <willy@xxxxxxxxxxxxx> Reported-by: Tangquan.Zheng <zhengtangquan@xxxxxxxx> Signed-off-by: Hailong.Liu <hailong.liu@xxxxxxxx> --- mm/util.c | 2 +- mm/vmalloc.c | 9 --------- 2 files changed, 1 insertion(+), 10 deletions(-) diff --git a/mm/util.c b/mm/util.c index 669397235787..b23133b738cf 100644 --- a/mm/util.c +++ b/mm/util.c @@ -657,7 +657,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node) * protection games. */ return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END, - flags, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP, + flags, PAGE_KERNEL, 0, node, __builtin_return_address(0)); } EXPORT_SYMBOL(kvmalloc_node); diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 03c78fae06f3..1914768f473e 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3577,15 +3577,6 @@ vm_area_alloc_pages(gfp_t gfp, int nid, page = alloc_pages(alloc_gfp, order); else page = alloc_pages_node(nid, alloc_gfp, order); - if (unlikely(!page)) { - if (!nofail) - break; - - /* fall back to the zero order allocations */ - alloc_gfp |= __GFP_NOFAIL; - order = 0; - continue; - } /* * Higher order allocations must be able to be treated as -- After 1) I check the code and I can't find a resonable band-aid to fix this. so the v2 patch works but ugly. Glad to hear a better solution :) [1] https://lore.kernel.org/lkml/20240724182827.nlgdckimtg2gwns5@xxxxxxxx/ 2.34.1