* Waiman Long <longman@xxxxxxxxxx> wrote: > The purpose of this patchset is to add a new class of locks called > terminal locks and converts some of the low level raw or regular > spinlocks to terminal locks. A terminal lock does not have forward > dependency and it won't allow a lock or unlock operation on another > lock. Two level nesting of terminal locks is allowed, though. > > Only spinlocks that are acquired with the _irq/_irqsave variants or > acquired in an IRQ disabled context should be classified as terminal > locks. > > Because of the restrictions on terminal locks, we can do simple checks on > them without using the lockdep lock validation machinery. The advantages > of making these changes are as follows: > > 1) The lockdep check will be faster for terminal locks without using > the lock validation code. > 2) It saves table entries used by the validation code and hence make > it harder to overflow those tables. > > In fact, it is possible to overflow some of the tables by running > a variety of different workloads on a debug kernel. I have seen bug > reports about exhausting MAX_LOCKDEP_KEYS, MAX_LOCKDEP_ENTRIES and > MAX_STACK_TRACE_ENTRIES. This patch will help to reduce the chance > of overflowing some of the tables. > > Performance wise, there was no statistically significant difference in > performanace when doing a parallel kernel build on a debug kernel. Could you please measure a locking intense workload instead, such as: $ perf stat --null --sync --repeat 10 perf bench sched messaging and profile which locks used there could be marked terminal, and measure the before/after performance impact? > Below were selected output lines from the lockdep_stats files of the > patched and unpatched kernels after bootup and running parallel kernel > builds. > > Item Unpatched kernel Patched kernel % Change > ---- ---------------- -------------- -------- > direct dependencies 9732 8994 -7.6% > dependency chains 18776 17033 -9.3% > dependency chain hlocks 76044 68419 -10.0% > stack-trace entries 110403 104341 -5.5% That's pretty impressive! > There were some reductions in the size of the lockdep tables. They were > not significant, but it is still a good start to rein in the number of > entries in those tables to make it harder to overflow them. Agreed. BTW., if you are interested in more radical approaches to optimize lockdep, we could also add a static checker via objtool driven call graph analysis, and mark those locks terminal that we can prove are terminal. This would require the unified call graph of the kernel image and of all modules to be examined in a final pass, but that's within the principal scope of objtool. (This 'final pass' could also be done during bootup, at least in initial versions.) Note that beyond marking it 'terminal' such a static analysis pass would also allow the detection of obvious locking bugs at the build (or boot) stage already - plus it would allow the disabling of lockdep for self-contained locks that don't interact with anything else. I.e. the static analysis pass would 'augment' lockdep and leave only those locks active for runtime lockdep tracking whose dependencies it cannot prove to be correct yet. Thanks, Ingo