On Tue, Mar 1, 2022 at 11:47 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Tue, Mar 01, 2022 at 09:53:54AM -0500, Steven Rostedt wrote: > > On Tue, 1 Mar 2022 10:05:12 +0100 > > Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > > On Mon, Feb 28, 2022 at 05:04:11PM -0800, Namhyung Kim wrote: > > > > The __mutex_lock_slowpath() and friends are declared as noinline and > > > > _RET_IP_ returns its caller as mutex_lock which is not meaningful. > > > > Pass the ip from mutex_lock() to have actual caller info in the trace. > > > > > > Blergh, can't you do a very limited unwind when you do the tracing > > > instead? 3 or 4 levels should be plenty fast and sufficient. > > > > Is there a fast and sufficient way that works across architectures? > > The normal stacktrace API? Or the fancy new arch_stack_walk() which is > already available on most architectures you actually care about and > risc-v :-) > > Remember, this is the contention path, we're going to stall anyway, > doing a few levels of unwind shouldn't really hurt at that point. > > Anyway; when I wrote that this morning, I was thinking: > > unsigned long ips[4]; > stack_trace_save(ips, 4, 0); When I collected stack traces in a BPF, it already consumed 3 or 4 entries in the BPF so I had to increase the size to 8 and skip 4. But it didn't add noticeable overheads in my test. Thanks, Namhyung