On Fri, Oct 15, 2021 at 05:11:25PM +1000, Nicholas Piggin wrote: > Excerpts from Chen Wandun's message of October 15, 2021 12:31 pm: > > > > > > 在 2021/10/15 9:34, Nicholas Piggin 写道: > >> Excerpts from Chen Wandun's message of October 14, 2021 6:59 pm: > >>> > >>> > >>> 在 2021/10/14 5:46, Shakeel Butt 写道: > >>>> On Tue, Sep 28, 2021 at 5:03 AM Chen Wandun <chenwandun@xxxxxxxxxx> wrote: > >>>>> > >>>>> Eric Dumazet reported a strange numa spreading info in [1], and found > >>>>> commit 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings") introduced > >>>>> this issue [2]. > >>>>> > >>>>> Dig into the difference before and after this patch, page allocation has > >>>>> some difference: > >>>>> > >>>>> before: > >>>>> alloc_large_system_hash > >>>>> __vmalloc > >>>>> __vmalloc_node(..., NUMA_NO_NODE, ...) > >>>>> __vmalloc_node_range > >>>>> __vmalloc_area_node > >>>>> alloc_page /* because NUMA_NO_NODE, so choose alloc_page branch */ > >>>>> alloc_pages_current > >>>>> alloc_page_interleave /* can be proved by print policy mode */ > >>>>> > >>>>> after: > >>>>> alloc_large_system_hash > >>>>> __vmalloc > >>>>> __vmalloc_node(..., NUMA_NO_NODE, ...) > >>>>> __vmalloc_node_range > >>>>> __vmalloc_area_node > >>>>> alloc_pages_node /* choose nid by nuam_mem_id() */ > >>>>> __alloc_pages_node(nid, ....) > >>>>> > >>>>> So after commit 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings"), > >>>>> it will allocate memory in current node instead of interleaving allocate > >>>>> memory. > >>>>> > >>>>> [1] > >>>>> https://lore.kernel.org/linux-mm/CANn89iL6AAyWhfxdHO+jaT075iOa3XcYn9k6JJc7JR2XYn6k_Q@xxxxxxxxxxxxxx/ > >>>>> > >>>>> [2] > >>>>> https://lore.kernel.org/linux-mm/CANn89iLofTR=AK-QOZY87RdUZENCZUT4O6a0hvhu3_EwRMerOg@xxxxxxxxxxxxxx/ > >>>>> > >>>>> Fixes: 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings") > >>>>> Reported-by: Eric Dumazet <edumazet@xxxxxxxxxx> > >>>>> Signed-off-by: Chen Wandun <chenwandun@xxxxxxxxxx> > >>>>> --- > >>>>> mm/vmalloc.c | 33 ++++++++++++++++++++++++++------- > >>>>> 1 file changed, 26 insertions(+), 7 deletions(-) > >>>>> > >>>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c > >>>>> index f884706c5280..48e717626e94 100644 > >>>>> --- a/mm/vmalloc.c > >>>>> +++ b/mm/vmalloc.c > >>>>> @@ -2823,6 +2823,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > >>>>> unsigned int order, unsigned int nr_pages, struct page **pages) > >>>>> { > >>>>> unsigned int nr_allocated = 0; > >>>>> + struct page *page; > >>>>> + int i; > >>>>> > >>>>> /* > >>>>> * For order-0 pages we make use of bulk allocator, if > >>>>> @@ -2833,6 +2835,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > >>>>> if (!order) { > >>>> > >>>> Can you please replace the above with if (!order && nid != NUMA_NO_NODE)? > >>>> > >>>>> while (nr_allocated < nr_pages) { > >>>>> unsigned int nr, nr_pages_request; > >>>>> + page = NULL; > >>>>> > >>>>> /* > >>>>> * A maximum allowed request is hard-coded and is 100 > >>>>> @@ -2842,9 +2845,23 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > >>>>> */ > >>>>> nr_pages_request = min(100U, nr_pages - nr_allocated); > >>>>> > >>>> > >>>> Undo the following change in this if block. > >>> > >>> Yes, It seem like more simpler as you suggested, But it still have > >>> performance regression, I plan to change the following to consider > >>> both mempolcy and alloc_pages_bulk. > >> > >> Thanks for finding and debugging this. These APIs are a maze of twisty > >> little passages, all alike so I could be as confused as I was when I > >> wrote that patch, but doesn't a minimal fix look something like this? > > > > Yes, I sent a patch,it looks like as you show, besides it also > > contains some performance optimization. > > > > [PATCH] mm/vmalloc: introduce alloc_pages_bulk_array_mempolicy to > > accelerate memory allocation > > Okay. It would be better to do it as two patches. First the minimal fix > so it can be backported easily and have the Fixes: tag pointed at my > commit. Then the performance optimization. > It is not only your commit. It also fixes my one :) <snip> commit 5c1f4e690eecc795b2e4d4408e87302040fceca4 Author: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx> Date: Mon Jun 28 19:40:14 2021 -0700 mm/vmalloc: switch to bulk allocator in __vmalloc_area_node() <snip> I agree there should be two separate patches which fix NUMA balancing issue, tagged with "Fixes" flag. One is located here: https://lkml.org/lkml/2021/10/15/1172 second one should be that fixes a second place where "big" pages are allocated, basically your patch. -- Vlad Rezki