MM global locks as core counts quadruple

David Rientjes <rientjes@xxxxxxxxxx> · Thu, 20 Jun 2024 17:35:45 -0700 (PDT)

Hi all,

As core counts are rapidly expanding over the next four years, Namhyung 
and I were looking at global locks that we're already seeing high 
contention on even today.

Some of these are not MM specific:
 - cgroup_mutex
 - cgroup_threadgroup_rwsem
 - tasklist_lock
 - kernfs_mutex (although should now be substantially better with the 
   kernfs_locks array)

Others *are* MM specific:
 - list_lrus_mutex
 - pcpu_drain_mutex
 - shrinker_mutex (formerly shrinker_rwsem)
 - vmap_purge_lock
 - slab_mutex

This is only looking at fleet data for global static locks, not locks like 
zone->lock that get dynamically allocated.

(mmap_lock was substantially improved by per-vma locking, although does 
show up for very large vmas.)

Couple questions:

 (1) How are people quantifying these pain points, if at all, in synthetic
     testing?  Any workloads or benchmarks that are really good at doing 
     this in the lab beyond the traditional will-it-scale?  (The above is
     from production data.)

 (2) Is anybody working on any of the above global locks?  Trying to 
     surface gaps for locks that will likely become even more painful in 
     the coming years.

Thanks!