The patch titled Subject: mm/vmalloc: use __this_cpu_try_cmpxchg() in preload_this_cpu_lock() has been added to the -mm mm-unstable branch. Its filename is mm-vmalloc-use-__this_cpu_try_cmpxchg-in-preload_this_cpu_lock.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-vmalloc-use-__this_cpu_try_cmpxchg-in-preload_this_cpu_lock.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Uros Bizjak <ubizjak@xxxxxxxxx> Subject: mm/vmalloc: use __this_cpu_try_cmpxchg() in preload_this_cpu_lock() Date: Tue, 28 May 2024 16:43:14 +0200 Use __this_cpu_try_cmpxchg() instead of __this_cpu_cmpxchg (*ptr, old, new) == old in preload_this_cpu_lock(). x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg. The generated code improves from: 4bb6: 48 85 f6 test %rsi,%rsi 4bb9: 0f 84 10 fa ff ff je 45cf <...> 4bbf: 4c 89 e8 mov %r13,%rax 4bc2: 65 48 0f b1 35 00 00 cmpxchg %rsi,%gs:0x0(%rip) 4bc9: 00 00 4bcb: 48 85 c0 test %rax,%rax 4bce: 0f 84 fb f9 ff ff je 45cf <...> to: 4bb6: 48 85 f6 test %rsi,%rsi 4bb9: 0f 84 10 fa ff ff je 45cf <...> 4bbf: 4c 89 e8 mov %r13,%rax 4bc2: 65 48 0f b1 35 00 00 cmpxchg %rsi,%gs:0x0(%rip) 4bc9: 00 00 4bcb: 0f 84 fe f9 ff ff je 45cf <...> No functional change intended. Link: https://lkml.kernel.org/r/20240528144345.5980-2-ubizjak@xxxxxxxxx Signed-off-by: Uros Bizjak <ubizjak@xxxxxxxxx> Reviewed-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> Cc: Lorenzo Stoakes <lstoakes@xxxxxxxxx> Cc: Dennis Zhou <dennis@xxxxxxxxxx> Cc: Tejun Heo <tj@xxxxxxxxxx> Cc: Christoph Lameter <cl@xxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/vmalloc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- a/mm/vmalloc.c~mm-vmalloc-use-__this_cpu_try_cmpxchg-in-preload_this_cpu_lock +++ a/mm/vmalloc.c @@ -1816,7 +1816,7 @@ static void free_vmap_area(struct vmap_a static inline void preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node) { - struct vmap_area *va = NULL; + struct vmap_area *va = NULL, *tmp; /* * Preload this CPU with one extra vmap_area object. It is used @@ -1832,7 +1832,8 @@ preload_this_cpu_lock(spinlock_t *lock, spin_lock(lock); - if (va && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, va)) + tmp = NULL; + if (va && !__this_cpu_try_cmpxchg(ne_fit_preload_node, &tmp, va)) kmem_cache_free(vmap_area_cachep, va); } _ Patches currently in -mm which might be from ubizjak@xxxxxxxxx are percpu-add-__this_cpu_try_cmpxchg.patch mm-vmalloc-use-__this_cpu_try_cmpxchg-in-preload_this_cpu_lock.patch fork-use-this_cpu_try_cmpxchg-in-try_release_thread_stack_to_cache.patch fork-use-this_cpu_try_cmpxchg-in-try_release_thread_stack_to_cache-fix.patch