On Wed, Aug 30, 2017 at 10:42:07AM +0200, Peter Zijlstra wrote: > > So the overhead looks to be spread out over all sorts, which makes it > harder to find and fix. > > stack unwinding is done lots and is fairly expensive, I've not yet > checked if crossrelease does too much of that. Aah, we do an unconditional stack unwind for every __lock_acquire() now. It keeps a trace in the xhlocks[]. Does the below cure most of that overhead? diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 44c8d0d17170..7b872036b72e 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -4872,7 +4872,7 @@ static void add_xhlock(struct held_lock *hlock) xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES; xhlock->trace.entries = xhlock->trace_entries; xhlock->trace.skip = 3; - save_stack_trace(&xhlock->trace); + /* save_stack_trace(&xhlock->trace); */ } static inline int same_context_xhlock(struct hist_lock *xhlock)