Excerpts from Chen Wandun's message of October 15, 2021 12:31 pm: > > > 在 2021/10/15 9:34, Nicholas Piggin 写道: >> Excerpts from Chen Wandun's message of October 14, 2021 6:59 pm: >>> >>> >>> 在 2021/10/14 5:46, Shakeel Butt 写道: >>>> On Tue, Sep 28, 2021 at 5:03 AM Chen Wandun <chenwandun@xxxxxxxxxx> wrote: >>>>> >>>>> Eric Dumazet reported a strange numa spreading info in [1], and found >>>>> commit 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings") introduced >>>>> this issue [2]. >>>>> >>>>> Dig into the difference before and after this patch, page allocation has >>>>> some difference: >>>>> >>>>> before: >>>>> alloc_large_system_hash >>>>> __vmalloc >>>>> __vmalloc_node(..., NUMA_NO_NODE, ...) >>>>> __vmalloc_node_range >>>>> __vmalloc_area_node >>>>> alloc_page /* because NUMA_NO_NODE, so choose alloc_page branch */ >>>>> alloc_pages_current >>>>> alloc_page_interleave /* can be proved by print policy mode */ >>>>> >>>>> after: >>>>> alloc_large_system_hash >>>>> __vmalloc >>>>> __vmalloc_node(..., NUMA_NO_NODE, ...) >>>>> __vmalloc_node_range >>>>> __vmalloc_area_node >>>>> alloc_pages_node /* choose nid by nuam_mem_id() */ >>>>> __alloc_pages_node(nid, ....) >>>>> >>>>> So after commit 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings"), >>>>> it will allocate memory in current node instead of interleaving allocate >>>>> memory. >>>>> >>>>> [1] >>>>> https://lore.kernel.org/linux-mm/CANn89iL6AAyWhfxdHO+jaT075iOa3XcYn9k6JJc7JR2XYn6k_Q@xxxxxxxxxxxxxx/ >>>>> >>>>> [2] >>>>> https://lore.kernel.org/linux-mm/CANn89iLofTR=AK-QOZY87RdUZENCZUT4O6a0hvhu3_EwRMerOg@xxxxxxxxxxxxxx/ >>>>> >>>>> Fixes: 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings") >>>>> Reported-by: Eric Dumazet <edumazet@xxxxxxxxxx> >>>>> Signed-off-by: Chen Wandun <chenwandun@xxxxxxxxxx> >>>>> --- >>>>> mm/vmalloc.c | 33 ++++++++++++++++++++++++++------- >>>>> 1 file changed, 26 insertions(+), 7 deletions(-) >>>>> >>>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >>>>> index f884706c5280..48e717626e94 100644 >>>>> --- a/mm/vmalloc.c >>>>> +++ b/mm/vmalloc.c >>>>> @@ -2823,6 +2823,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid, >>>>> unsigned int order, unsigned int nr_pages, struct page **pages) >>>>> { >>>>> unsigned int nr_allocated = 0; >>>>> + struct page *page; >>>>> + int i; >>>>> >>>>> /* >>>>> * For order-0 pages we make use of bulk allocator, if >>>>> @@ -2833,6 +2835,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, >>>>> if (!order) { >>>> >>>> Can you please replace the above with if (!order && nid != NUMA_NO_NODE)? >>>> >>>>> while (nr_allocated < nr_pages) { >>>>> unsigned int nr, nr_pages_request; >>>>> + page = NULL; >>>>> >>>>> /* >>>>> * A maximum allowed request is hard-coded and is 100 >>>>> @@ -2842,9 +2845,23 @@ vm_area_alloc_pages(gfp_t gfp, int nid, >>>>> */ >>>>> nr_pages_request = min(100U, nr_pages - nr_allocated); >>>>> >>>> >>>> Undo the following change in this if block. >>> >>> Yes, It seem like more simpler as you suggested, But it still have >>> performance regression, I plan to change the following to consider >>> both mempolcy and alloc_pages_bulk. >> >> Thanks for finding and debugging this. These APIs are a maze of twisty >> little passages, all alike so I could be as confused as I was when I >> wrote that patch, but doesn't a minimal fix look something like this? > > Yes, I sent a patch,it looks like as you show, besides it also > contains some performance optimization. > > [PATCH] mm/vmalloc: introduce alloc_pages_bulk_array_mempolicy to > accelerate memory allocation Okay. It would be better to do it as two patches. First the minimal fix so it can be backported easily and have the Fixes: tag pointed at my commit. Then the performance optimization. Thanks, Nick