For perf lock contention, the current owner tracking (-o option) only works with per-thread mode (-t option). Enabling call stack mode for owner can be useful for diagnosing why a system running slow in lock contention. Example output: $ sudo ~/linux/tools/perf/perf lock con -abvo -Y mutex -E16 perf bench sched pipe ... contended total wait max wait avg wait type caller 171 1.55 ms 20.26 us 9.06 us mutex pipe_read+0x57 0xffffffffac6318e7 pipe_read+0x57 0xffffffffac623862 vfs_read+0x332 0xffffffffac62434b ksys_read+0xbb 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 36 193.71 us 15.27 us 5.38 us mutex pipe_write+0x50 0xffffffffac631ee0 pipe_write+0x50 0xffffffffac6241db vfs_write+0x3bb 0xffffffffac6244ab ksys_write+0xbb 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 4 51.22 us 16.47 us 12.80 us mutex do_epoll_wait+0x24d 0xffffffffac691f0d do_epoll_wait+0x24d 0xffffffffac69249b do_epoll_pwait.part.0+0xb 0xffffffffac693ba5 __x64_sys_epoll_pwait+0x95 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 2 20.88 us 11.95 us 10.44 us mutex do_epoll_wait+0x24d 0xffffffffac691f0d do_epoll_wait+0x24d 0xffffffffac693943 __x64_sys_epoll_wait+0x73 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 1 7.33 us 7.33 us 7.33 us mutex do_epoll_ctl+0x6c1 0xffffffffac692e01 do_epoll_ctl+0x6c1 0xffffffffac6937e0 __x64_sys_epoll_ctl+0x70 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 1 6.64 us 6.64 us 6.64 us mutex do_epoll_ctl+0x3d4 0xffffffffac692b14 do_epoll_ctl+0x3d4 0xffffffffac6937e0 __x64_sys_epoll_ctl+0x70 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 === owner stack trace === 3 31.24 us 15.27 us 10.41 us mutex pipe_read+0x348 0xffffffffac631bd8 pipe_read+0x348 0xffffffffac623862 vfs_read+0x332 0xffffffffac62434b ksys_read+0xbb 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 ... v7: Remove duplicate contention records. v6: lore.kernel.org/20250219214400.3317548-1-ctshao@xxxxxxxxxx Free allocated memory in error patch. Add description in man page. v5: lore.kernel.org/20250212222859.2086080-1-ctshao@xxxxxxxxxx Move duplicated code into function. Remove code to retrieve undesired callstack at the end of `contention_end()`. Other minor fix based on Namhyung's review. v4: lore.kernel.org/20250130052510.860318-1-ctshao@xxxxxxxxxx Use `__sync_fetch_and_add()` to generate owner stackid automatically. Use `__sync_fetch_and_add(..., -1)` to workaround compiler error from `__sync_fetch_and_sub()` Remove unnecessary include headers. Dedicate on C-style comment. Other minor fix based on Namhyung's review. v3: lore.kernel.org/20250129001905.619859-1-ctshao@xxxxxxxxxx Rename/shorten multiple variables. Implement owner stackid. Add description for lock function return code in `contention_end()`. Other minor fix based on Namhyung's review. v2: lore.kernel.org/20250113052220.2105645-1-ctshao@xxxxxxxxxx Fix logic deficit in v1 patch 2/4. v1: lore.kernel.org/20250110051346.1507178-1-ctshao@xxxxxxxxxx Chun-Tse Shao (4): perf lock: Add bpf maps for owner stack tracing perf lock: Retrieve owner callstack in bpf program perf lock: Make rb_tree helper functions generic perf lock: Report owner stack in usermode tools/perf/Documentation/perf-lock.txt | 5 +- tools/perf/builtin-lock.c | 56 +++- tools/perf/util/bpf_lock_contention.c | 71 ++++- .../perf/util/bpf_skel/lock_contention.bpf.c | 245 +++++++++++++++++- tools/perf/util/bpf_skel/lock_data.h | 7 + tools/perf/util/lock-contention.h | 7 + 6 files changed, 365 insertions(+), 26 deletions(-) -- 2.48.1.658.g4767266eb4-goog