On Fri, Mar 22, 2024 at 11:21:02AM -0700, Guenter Roeck wrote: > Hi, > > On Tue, Jan 02, 2024 at 07:46:29PM +0100, Uladzislau Rezki (Sony) wrote: > > Concurrent access to a global vmap space is a bottle-neck. > > We can simulate a high contention by running a vmalloc test > > suite. > > > > To address it, introduce an effective vmap node logic. Each > > node behaves as independent entity. When a node is accessed > > it serves a request directly(if possible) from its pool. > > > > This model has a size based pool for requests, i.e. pools are > > serialized and populated based on object size and real demand. > > A maximum object size that pool can handle is set to 256 pages. > > > > This technique reduces a pressure on the global vmap lock. > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx> > > This patch results in a persistent "spinlock bad magic" message > when booting s390 images with spinlock debugging enabled. > > [ 0.465445] BUG: spinlock bad magic on CPU#0, swapper/0 > [ 0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 > [ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e398669 #1 > [ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux) > [ 0.466270] Call Trace: > [ 0.466470] [<00000000011f26c8>] dump_stack_lvl+0x98/0xd8 > [ 0.466516] [<00000000001dcc6a>] do_raw_spin_lock+0x8a/0x108 > [ 0.466545] [<000000000042146c>] find_vmap_area+0x6c/0x108 > [ 0.466572] [<000000000042175a>] find_vm_area+0x22/0x40 > [ 0.466597] [<000000000012f152>] __set_memory+0x132/0x150 > [ 0.466624] [<0000000001cc0398>] vmem_map_init+0x40/0x118 > [ 0.466651] [<0000000001cc0092>] paging_init+0x22/0x68 > [ 0.466677] [<0000000001cbbed2>] setup_arch+0x52a/0x708 > [ 0.466702] [<0000000001cb6140>] start_kernel+0x80/0x5c8 > [ 0.466727] [<0000000000100036>] startup_continue+0x36/0x40 > > Bisect results and decoded stacktrace below. > > The uninitialized spinlock is &vn->busy.lock. > Debugging shows that this lock is actually never initialized. > It is. Once the vmalloc_init() "main entry" function is called from the: <snip> start_kernel() mm_core_init() vmalloc_init() <snip> > [ 0.464684] ####### locking 0000000002280fb8 > [ 0.464862] BUG: spinlock bad magic on CPU#0, swapper/0 > ... > [ 0.464684] ####### locking 0000000002280fb8 > [ 0.477479] ####### locking 0000000002280fb8 > [ 0.478166] ####### locking 0000000002280fb8 > [ 0.478218] ####### locking 0000000002280fb8 > ... > [ 0.718250] #### busy lock init 0000000002871860 > [ 0.718328] #### busy lock init 00000000028731b8 > > Only the initialized locks are used after the call to vmap_init_nodes(). > Right, when the vmap space and vmalloc is initialized. > Guenter > > --- > # bad: [8e938e39866920ddc266898e6ae1fffc5c8f51aa] Merge tag '6.9-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 > # good: [e8f897f4afef0031fe618a8e94127a0934896aba] Linux 6.8 > git bisect start 'HEAD' 'v6.8' > # good: [e56bc745fa1de77abc2ad8debc4b1b83e0426c49] smb311: additional compression flag defined in updated protocol spec > git bisect good e56bc745fa1de77abc2ad8debc4b1b83e0426c49 > # bad: [902861e34c401696ed9ad17a54c8790e7e8e3069] Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > git bisect bad 902861e34c401696ed9ad17a54c8790e7e8e3069 > # good: [480e035fc4c714fb5536e64ab9db04fedc89e910] Merge tag 'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernel > git bisect good 480e035fc4c714fb5536e64ab9db04fedc89e910 > # good: [fe46a7dd189e25604716c03576d05ac8a5209743] Merge tag 'sound-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound > git bisect good fe46a7dd189e25604716c03576d05ac8a5209743 > # bad: [435a75548109f19e5b5b14ae35b9acb063c084e9] mm: use folio more widely in __split_huge_page > git bisect bad 435a75548109f19e5b5b14ae35b9acb063c084e9 > # good: [4d5bf0b6183f79ea361dd506365d2a471270735c] mm/mmu_gather: add tlb_remove_tlb_entries() > git bisect good 4d5bf0b6183f79ea361dd506365d2a471270735c > # bad: [4daacfe8f99f4b4cef562649d56c48642981f46e] mm/damon/sysfs-schemes: support PSI-based quota auto-tune > git bisect bad 4daacfe8f99f4b4cef562649d56c48642981f46e > # good: [217b2119b9e260609958db413876f211038f00ee] mm,page_owner: implement the tracking of the stacks count > git bisect good 217b2119b9e260609958db413876f211038f00ee > # bad: [40254101d87870b2e5ac3ddc28af40aa04c48486] arm64, crash: wrap crash dumping code into crash related ifdefs > git bisect bad 40254101d87870b2e5ac3ddc28af40aa04c48486 > # bad: [53becf32aec1c8049b854f0c31a11df5ed75df6f] mm: vmalloc: support multiple nodes in vread_iter > git bisect bad 53becf32aec1c8049b854f0c31a11df5ed75df6f > # good: [7fa8cee003166ef6db0bba70d610dbf173543811] mm: vmalloc: move vmap_init_free_space() down in vmalloc.c > git bisect good 7fa8cee003166ef6db0bba70d610dbf173543811 > # good: [282631cb2447318e2a55b41a665dbe8571c46d70] mm: vmalloc: remove global purge_vmap_area_root rb-tree > git bisect good 282631cb2447318e2a55b41a665dbe8571c46d70 > # bad: [96aa8437d169b8e030a98e2b74fd9a8ee9d3be7e] mm: vmalloc: add a scan area of VA only once > git bisect bad 96aa8437d169b8e030a98e2b74fd9a8ee9d3be7e > # bad: [72210662c5a2b6005f6daea7fe293a0dc573e1a5] mm: vmalloc: offload free_vmap_area_lock lock > git bisect bad 72210662c5a2b6005f6daea7fe293a0dc573e1a5 > # first bad commit: [72210662c5a2b6005f6daea7fe293a0dc573e1a5] mm: vmalloc: offload free_vmap_area_lock lock > > --- > [ 0.465490] lock: single+0x1860/0x1958, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 > [ 0.466067] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-12955-g8e938e398669 #1 > [ 0.466188] Hardware name: QEMU 8561 QEMU (KVM/Linux) > [ 0.466270] Call Trace: > [ 0.466470] dump_stack_lvl (lib/dump_stack.c:117) > [ 0.466516] do_raw_spin_lock (kernel/locking/spinlock_debug.c:87 kernel/locking/spinlock_debug.c:115) > [ 0.466545] find_vmap_area (mm/vmalloc.c:1059 mm/vmalloc.c:2364) > [ 0.466572] find_vm_area (mm/vmalloc.c:3150) > [ 0.466597] __set_memory (arch/s390/mm/pageattr.c:360 arch/s390/mm/pageattr.c:393) > [ 0.466624] vmem_map_init (./arch/s390/include/asm/set_memory.h:55 arch/s390/mm/vmem.c:660) > [ 0.466651] paging_init (arch/s390/mm/init.c:97) > [ 0.466677] setup_arch (arch/s390/kernel/setup.c:972) > [ 0.466702] start_kernel (init/main.c:899) > [ 0.466727] startup_continue (arch/s390/kernel/head64.S:35) > [ 0.466811] INFO: lockdep is turned off. > <snip> diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 22aa63f4ef63..0d77d171b5d9 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2343,6 +2343,9 @@ struct vmap_area *find_vmap_area(unsigned long addr) struct vmap_area *va; int i, j; + if (unlikely(!vmap_initialized)) + return NULL; + /* * An addr_to_node_id(addr) converts an address to a node index * where a VA is located. If VA spans several zones and passed <snip> Could you please test it? -- Uladzislau Rezki