On Fri, May 14, 2021 at 11:19:20AM +0100, Mel Gorman wrote: > On Thu, May 13, 2021 at 10:18:51PM +0200, Uladzislau Rezki wrote: > > <SNIP> > > > > From the trace i get: > > > > <snip> > > [ 0.243916] RIP: 0010:__alloc_pages+0x11e/0x310 > > [ 0.243916] Code: 84 c0 0f 85 02 01 00 00 89 d8 48 8b 54 24 08 8b 74 24 1c c1 e8 0c 83 e0 01 88 44 24 20 48 8b 04 24 48 85 d2 0f 85 71 01 00 00 <3b> 70 08 0f 82 68 01 00 00 48 89 44 24 10 48 8b 00 89 da 81 e2 00 > > [ 0.243916] RSP: 0000:ffffffffae803c38 EFLAGS: 00010246 > > [ 0.243916] RAX: 0000000000001cc0 RBX: 0000000000002102 RCX: 0000000000000004 > > [ 0.243916] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002102 > > [ 0.243916] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffffdfff > > [ 0.243916] R10: 0000000000000001 R11: ffffffffae803ac0 R12: 0000000000000000 > > [ 0.243916] R13: 0000000000002102 R14: 0000000000000001 R15: ffffa0938000d000 > > [ 0.243916] FS: 0000000000000000(0000) GS:ffff893ab7c00000(0000) knlGS:0000000000000000 > > [ 0.243916] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 0.243916] CR2: 0000000000001cc8 CR3: 0000000176e10000 CR4: 00000000000006b0 > > [ 0.243916] Call Trace: > > [ 0.243916] __alloc_pages_bulk+0xaa1/0xb50 > > <snip> > > > > (gdb) l *__alloc_pages+0x11e > > 0xffffffff8129d87e is in __alloc_pages (./include/linux/mmzone.h:1095). > > 1090 return zoneref->zone; > > 1091 } > > 1092 > > 1093 static inline int zonelist_zone_idx(struct zoneref *zoneref) > > 1094 { > > 1095 return zoneref->zone_idx; > > 1096 } > > 1097 > > 1098 static inline int zonelist_node_idx(struct zoneref *zoneref) > > 1099 { > > (gdb) > > > > Seems like "zoneref" refers to invalid address. > > > > Thoughts? > > I have not previously read the patch but there are a few concerns and it's > probably just as well this blew up early. The bulk allocator assumes a > valid node but the patch can send in NUMA_NO_NODE (-1). > Should the bulk-allocator handle the NUMA_NO_NODE on its own? I mean instead of handling by user the allocator itself fixes it if NUMA_NO_NODE is passed. > > On the high-order path alloc_pages_node is used which checks nid == NUMA_NO_NODE. > Also, area->pages is not necessarily initialised so that could be interpreted > as a partially populated array so minmally you need. > area->pages are zeroed, because __GFP_ZERO is sued during allocating an array. > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index f3c5dd4ccc5b9..b638ff31b07e1 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2792,6 +2792,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > page_order = vm_area_page_order(area); > > if (!page_order) { > + memset(area->pages, 0, array_size); This memset is not needed as explained above. > + if (node == NUMA_NO_NODE) > + node = numa_mem_id(); This patch works. Again should it be processed by allocator? > area->nr_pages = __alloc_pages_bulk(gfp_mask, node, > NULL, nr_small_pages, NULL, area->pages); > } else { > > However, the high-order path also looks suspicious. area->nr_pages is > advanced before the allocation attempt so in the event alloc_pages_node() > returns NULL prematurely, area->nr_pages does not reflect the number of > pages allocated so that needs examination. > <snip> for (area->nr_pages = 0; area->nr_pages < nr_small_pages; area->nr_pages += 1U << page_order) { <snip> if alloc_pages_node() fails we break the loop. area->nr_pages is initialized inside the for(...) loop, thus it will be zero if the single page allocator fails on a first iteration. Or i miss your point? > As an aside, where or what is test_vmalloc.sh? It appears to have been > used a few times but it's not clear it's representative so are you aware > of workloads that are vmalloc-intensive? It does not matter for the > patch as such but it would be nice to know examples of vmalloc-intensive > workloads because I cannot recall a time during the last few years where > I saw vmalloc.c high in profiles. > test_vmalloc.sh is a shell script that is used for stressing and testing a vmalloc subsystem as well as performance evaluation. You can find it here: ./tools/testing/selftests/vm/test_vmalloc.sh As for workloads. Most of them which are critical to time and latency. For example audio/video, especially in the mobile area. I did a big rework of the KVA allocator because i found it not optimal to allocation time. To be more specific, a high resolution audio playback was suffering from the glitches due to a long allocation time and preemption issues. -- Vlad Rezki