Oops, I used the wrong email address of Paul. Sorry about that! I'll resend with a new address soon. Thanks, Namhyung On Tue, Feb 8, 2022 at 10:42 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote: > > Hello, > > There have been some requests for low-overhead kernel lock contention > monitoring. The kernel has CONFIG_LOCK_STAT to provide such an infra > either via /proc/lock_stat or tracepoints directly. > > However it's not light-weight and hard to be used in production. So > I'm trying to separate out the tracepoints and using them as a base to > build a new monitoring system. > > As the lockdep and lock_stat provide good hooks in the lock functions, > it'd be natural to reuse them. Actually I tried to use lockdep as is > but disables the functionality at runtime (initialize debug_locks = 0, > lock_stat = 0). But it still has unacceptable overhead and the > lockdep data structures also increase memory footprint unnecessarily. > > So I'm proposing a separate tracepoint-only configuration and keeping > lockdep_map only with minimal information needed for tracepoints (for > now, it's just name). And then the existing lockdep hooks can be used > for the tracepoints. > > The patch 01-06 are preparation for the work. In a few places in the > kernel, they calls lockdep annotation explicitly to deal with > limitations in the lockdep implementation. In my understanding, they > are not needed to analyze lock contention. > > To make matters worse, they rely on the compiler optimization (or > macro magic) so that it can get rid of the annotations and their > arguments when lockdep is not configured. > > But it's not true any more when lock tracepoints are added and it'd > cause build errors. So I added #ifdef guards for LOCKDEP in the code > to prevent such errors. > > In the patch 07 I mechanically changed most of code that depend on > CONFIG_LOCKDEP or CONFIG_DEBUG_LOCK_ALLOC to CONFIG_LOCK_INFO. It > paves the way for the codes to be shared for lockdep and tracepoints. > Mostly, it makes sure that locks are initialized with a proper name, > like in the patch 08 and 09. > > I didn't change some places intentionally - for example, timer and > workqueue depend on LOCKDEP explicitly since they use some lockdep > annotations to work around the "held lock freed" warnings. The ocfs2 > directly accesses lockdep_map.key so I didn't touch the code for now. > And RCU was because it generates too much overhead thanks to the > rcu_read_lock(). Maybe I need to revisit some of them later. > > I added CONFIG_LOCK_TRACEPOINTS in the patch 10 to make it optional. > I found that it adds 2~3% of overhead when I ran `perf bench sched > messaging` even when the tracepoints are disabled. The benchmark > creates a lot of processes and make them communicate by socket pairs > (or pipes). I measured that around 15% of lock acquisition creates > contentions and it's mostly for spin locks (alc->lock and u->lock). > > I ran perf record + report with the workload and it showed 50% of the > cpu cycles are in the spin lock slow path. So it's highly affected by > the spin lock slow path. Actually LOCK_CONTENDED() macro transforms > the spin lock code (and others) to use trylock first and then fall > back to real lock function if failed. Thus it'd add more (atomic) > operations and cache line bouncing for the contended cases: > > #define LOCK_CONTENDED(_lock, try, lock) \ > do { \ > if (!try(_lock)) { \ > lock_contended(&(_lock)->dep_map, _RET_IP_); \ > lock(_lock); \ > } \ > lock_acquired(&(_lock)->dep_map, _RET_IP_); \ > } while (0) > > If I modify the macro not to use trylock and to call the real lock > function directly (so the lock_contended tracepoint would be called > always, if enabled), the overhead goes down to 0.x% when the > tracepoints are disabled. > > I don't have a good solution as long as we use LOCK_CONTENDED() macro > to separate the contended locking path. Maybe we can make it call > some (generic) slow path lock function directly after failing trylock. > Or move the lockdep annotations into the each lock function bodies and > get rid of the LOCK_CONTENDED() macro entirely. Ideas? > > Actually the patch 11 handles the same issue on the mutex code. The > fast version of mutex trylock was attempted only if LOCKDEP is not > enabled and it affects the mutex lock performance in the uncontended > cases too. So I partially reverted the change in the patch 07 to use > the fast functions with lock tracepoints too. Maybe we can use it > with LOCKDEP as well? > > The last patch 12 might be controversial and I'd like to move the > lock_acquired annotation into the if(!try) block in the LOCK_CONTEDED > macro so that it can only be called when there's a contention. > > Eventually I'm mostly interested in the contended locks only and I > want to reduce the overhead in the fast path. By moving that, it'd be > easy to track contended locks with timing by using two tracepoints. > > It'd change lock hold time calculation in lock_stat for the fast path, > but I assume time difference between lock_acquire and lock_acquired > would be small when the lock is not contended. So I think we can use > the timestamp in lock_acquire. If it's not acceptable, we might > consider adding a new tracepoint to track the timing of contended > locks. > > This series base on the current tip/locking/core and you get it from > 'locking/tracepoint-v1' branch in my tree at: > > git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git > > > Thanks, > Namhyung > > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx> > Cc: Tejun Heo <tj@xxxxxxxxxx> > Cc: Byungchul Park <byungchul.park@xxxxxxx> > Cc: rcu@xxxxxxxxxxxxxxx > Cc: cgroups@xxxxxxxxxxxxxxx > Cc: linux-btrfs@xxxxxxxxxxxxxxx > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx > > > Namhyung Kim (12): > locking: Pass correct outer wait type info > cgroup: rstat: Make cgroup_rstat_cpu_lock name readable > timer: Protect lockdep functions with #ifdef > workqueue: Protect lockdep functions with #ifdef > drm/i915: Protect lockdep functions with #ifdef > btrfs: change lockdep class size check using ks->names > locking: Introduce CONFIG_LOCK_INFO > locking/mutex: Init name properly w/ CONFIG_LOCK_INFO > locking: Add more static lockdep init macros > locking: Add CONFIG_LOCK_TRACEPOINTS option > locking/mutex: Revive fast functions for LOCK_TRACEPOINTS > locking: Move lock_acquired() from the fast path > > drivers/gpu/drm/drm_connector.c | 7 +- > drivers/gpu/drm/i915/i915_sw_fence.h | 2 +- > drivers/gpu/drm/i915/intel_wakeref.c | 3 + > drivers/gpu/drm/i915/selftests/lib_sw_fence.h | 2 +- > .../net/wireless/intel/iwlwifi/iwl-trans.c | 4 +- > .../net/wireless/intel/iwlwifi/iwl-trans.h | 2 +- > drivers/tty/tty_ldsem.c | 2 +- > fs/btrfs/disk-io.c | 4 +- > fs/btrfs/disk-io.h | 2 +- > fs/cifs/connect.c | 2 +- > fs/kernfs/file.c | 2 +- > include/linux/completion.h | 2 +- > include/linux/jbd2.h | 2 +- > include/linux/kernfs.h | 2 +- > include/linux/kthread.h | 2 +- > include/linux/local_lock_internal.h | 18 +- > include/linux/lockdep.h | 170 ++++++++++++++++-- > include/linux/lockdep_types.h | 8 +- > include/linux/mmu_notifier.h | 2 +- > include/linux/mutex.h | 12 +- > include/linux/percpu-rwsem.h | 4 +- > include/linux/regmap.h | 4 +- > include/linux/rtmutex.h | 14 +- > include/linux/rwlock_api_smp.h | 4 +- > include/linux/rwlock_rt.h | 4 +- > include/linux/rwlock_types.h | 11 +- > include/linux/rwsem.h | 14 +- > include/linux/seqlock.h | 4 +- > include/linux/spinlock_api_smp.h | 4 +- > include/linux/spinlock_rt.h | 4 +- > include/linux/spinlock_types.h | 4 +- > include/linux/spinlock_types_raw.h | 28 ++- > include/linux/swait.h | 2 +- > include/linux/tty_ldisc.h | 2 +- > include/linux/wait.h | 2 +- > include/linux/ww_mutex.h | 6 +- > include/media/v4l2-ctrls.h | 2 +- > include/net/sock.h | 2 +- > include/trace/events/lock.h | 4 +- > kernel/cgroup/rstat.c | 7 +- > kernel/locking/Makefile | 1 + > kernel/locking/lockdep.c | 40 ++++- > kernel/locking/mutex-debug.c | 2 +- > kernel/locking/mutex.c | 22 ++- > kernel/locking/mutex.h | 7 + > kernel/locking/percpu-rwsem.c | 2 +- > kernel/locking/rtmutex_api.c | 10 +- > kernel/locking/rwsem.c | 4 +- > kernel/locking/spinlock.c | 2 +- > kernel/locking/spinlock_debug.c | 4 +- > kernel/locking/spinlock_rt.c | 8 +- > kernel/locking/ww_rt_mutex.c | 2 +- > kernel/printk/printk.c | 14 +- > kernel/rcu/update.c | 27 +-- > kernel/time/timer.c | 8 +- > kernel/workqueue.c | 13 ++ > lib/Kconfig.debug | 14 ++ > mm/memcontrol.c | 7 +- > mm/mmu_notifier.c | 2 +- > net/core/dev.c | 2 +- > net/sunrpc/svcsock.c | 2 +- > net/sunrpc/xprtsock.c | 2 +- > 62 files changed, 391 insertions(+), 180 deletions(-) > > > base-commit: 1dc01abad6544cb9d884071b626b706e37aa9601 > -- > 2.35.0.263.gb82422642f-goog >