On Sun, Mar 19, 2023 at 07:09:31AM +0000, Lorenzo Stoakes wrote: > vmalloc() is, by design, not permitted to be used in atomic context and > already contains components which may sleep, so avoiding spin locks is not > a problem from the perspective of atomic context. > > The global vmap_area_lock is held when the red/black tree rooted in > vmap_are_root is accessed and thus is rather long-held and under > potentially high contention. It is likely to be under contention for reads > rather than write, so replace it with a rwsem. > > Each individual vmap_block->lock is likely to be held for less time but > under low contention, so a mutex is not an outrageous choice here. > > A subset of test_vmalloc.sh performance results:- > > fix_size_alloc_test 0.40% > full_fit_alloc_test 2.08% > long_busy_list_alloc_test 0.34% > random_size_alloc_test -0.25% > random_size_align_alloc_test 0.06% > ... > all tests cycles 0.2% > > This represents a tiny reduction in performance that sits barely above > noise. I'm travelling right now, but give me a few days and I'll test this against the XFS workloads that hammer the global vmalloc spin lock really, really badly. XFS can use vm_map_ram and vmalloc really heavily for metadata buffers and hit the global spin lock from every CPU in the system at the same time (i.e. highly concurrent workloads). vmalloc is also heavily used in the hottest path throught the journal where we process and calculate delta changes to several million items every second, again spread across every CPU in the system at the same time. We really need the global spinlock to go away completely, but in the mean time a shared read lock should help a little bit.... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx