On Thu, Jul 25, 2024 at 5:58 PM Hailong Liu <hailong.liu@xxxxxxxx> wrote: > > On Thu, 25. Jul 21:34, Barry Song wrote: > > On Thu, Jul 25, 2024 at 9:17 PM Hailong Liu <hailong.liu@xxxxxxxx> wrote: > > > > > > On Thu, 25. Jul 18:21, Barry Song wrote: > > > > On Thu, Jul 25, 2024 at 3:53 PM <hailong.liu@xxxxxxxx> wrote: > > > [snip] > > > > > > > > This is still incorrect because it undoes Michal's work. We also need to break > > > > the loop if (!nofail), which you're currently omitting. > > > > > > IIUC, the origin issue is to fix kvcalloc with __GFP_NOFAIL return NULL. > > > https://lore.kernel.org/all/ZAXynvdNqcI0f6Us@xxxxxxxxxxxxxx/T/#u > > > if we disable huge flag in kmalloc_node, the issue will be fixed. > > > > No, this just bypasses kvmalloc and doesn't solve the underlying issue. Problems > > can still be triggered by vmalloc_huge() even after the bypass. Once we > > reorganize vmap_huge to support the combination of PMD and PTE > > mapping, we should re-enable HUGE_VMAP for kvmalloc. > Totally agree, This will take some time to support. As in [1] I prepare to fix > with a offset in page_private to indicate the location of fallback. > > > > > I would consider dropping VM_ALLOW_HUGE_VMAP() for kvmalloc as > > an short-term "optimization" to save memory rather than a long-term fix. This > > 'optimization' is only valid until we reorganize HUGE_VMAP in a way > > similar to THP. I mean, for a 2.1MB kvmalloc, we can map 2MB as PMD > > and 0.1 as PTE. > However this just fixed the kvmalloc_node, but for others who call > vmalloc_huge(), the issue exits. so I remove the Michal's code. sorry for this. My proposal was to fallback to order-0 for __GFP_NOFAIL even before vm_area_alloc_pages() as a short-term quick "fix". We need to meet three conditions to do HUGE_VMAP 1. vmap_allow_huge 2. vm_flags & VM_ALLOW_HUGE_VMAP 3. !__GFP_NOFAIL gfp_flags This is because if we fallback within vm_area_alloc_pages(), the caller still expects vm_area_alloc_pages() to return contiguous 2MB memory. By removing this assumption from its callers, its caller will realize vm_area_alloc_pages() is returning small pages. That means, vm_area gets 0 as page_order from the first beginning if we have __GFP_NOFAIL in gfp_flags. Other fixes appear to require significant changes to the source code and can't be done quickly. > > > > > > > > > > > To avoid reverting Michal's work, the simplest "fix" would be, > > > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > > > index caf032f0bd69..0011ca30df1c 100644 > > > > --- a/mm/vmalloc.c > > > > +++ b/mm/vmalloc.c > > > > @@ -3775,7 +3775,7 @@ void *__vmalloc_node_range_noprof(unsigned long > > > > size, unsigned long align, > > > > return NULL; > > > > } > > > > > > > > - if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) { > > > > + if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP) & > > > > !(gfp_mask & __GFP_NOFAIL)) { > > > > unsigned long size_per_node; > > > > > > > > /* > > > > > > > > > > [1] https://lore.kernel.org/lkml/20240724182827.nlgdckimtg2gwns5@xxxxxxxx/ > > > > > 2.34.1 > > > > > > > > Thanks > > > > Barry > > > > > > -- > > > help you, help me, > > > Hailong. > > -- > help you, help me, > Hailong.