Vmalloc may offen get pages by loop invoke alloc_pags, this is cost too much time in count watermark/cpuset or something. Let's just try alloc by alloc_pages_bulk, if failed, fullback in original path. With my own test, simulate loop alloc_page and alloc_pages_bulk_array, get this: size 1M 10M 20M 30 normal 44 1278 3665 5581 test 34 889 2167 3300 optimize 22% 30% 40% 40% And in my vmalloc top sort, zram/f2fs may alloc more than 20MB, so, It's worth to use alloc_pages_bulk. Signed-off-by: Yang Huan <link@xxxxxxxx> --- mm/vmalloc.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index a13ac524f6ff..b5af7b4e30bc 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2791,17 +2791,23 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, } area->pages = pages; - area->nr_pages = nr_small_pages; + area->nr_pages = 0; set_vm_area_page_order(area, page_shift - PAGE_SHIFT); page_order = vm_area_page_order(area); - + /* first try alloc in alloc bulk when order is 0*/ + if (!page_order) { + area->nr_pages = alloc_pages_bulk_array( + gfp_mask, nr_small_pages, area->pages); + if (likely(area->nr_pages == nr_small_pages)) + goto success; + } /* * Careful, we allocate and map page_order pages, but tracking is done * per PAGE_SIZE page so as to keep the vm_struct APIs independent of * the physical/mapped size. */ - for (i = 0; i < area->nr_pages; i += 1U << page_order) { + for (i = area->nr_pages; i < nr_small_pages; i += 1U << page_order) { struct page *page; int p; @@ -2824,6 +2830,8 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, if (gfpflags_allow_blocking(gfp_mask)) cond_resched(); } + area->nr_pages = nr_small_pages; +success: atomic_long_add(area->nr_pages, &nr_vmalloc_pages); if (vmap_pages_range(addr, addr + size, prot, pages, page_shift) < 0) { -- 2.32.0