RFC: LockDoc - Detecting Locking Bugs in the Linux Kernel

Alexander Lochmann <alexander.lochmann@xxxxxxxxxxxxxx> · Tue, 27 Nov 2018 23:19:54 +0100

Hi folks,

during the past months we've been developing LockDoc, a trace-based
approach of Lock Analysis in the Linux Kernel.
LockDoc uses execution traces of an instrumented Linux Kernel to
automatically deduce
locking rules for all members of arbitrary kernel data structures.
The traces are gathered running a manually selected fs-specific subset
of the Linux Test Project in a virtual machine.
These locking rules can be used to generate a comprehensive locking
documentation and to reveal potential bugs.

LockDoc generates rules for each tuple of (data structure, member, {r,w}).
It completely ignores any element of type atomic{,64,long}_t as well as
atomic_*() functions.
Accesses during initialization and destruction of objects are also ignored.
The output of LockDoc looks like this:
inode member: i_flags [w] (15 lock combinations)
  hypotheses: 96
     15.8% (88 out of 558 mem accesses): EMBOTHER(inode.i_rwsem[w]) ->
EMBSAME(inode.i_rwsem[w])
    counterexample.sql.sh inode w:i_flags CEX SEQ
'EMBOTHER(inode.i_rwsem[w])' 'EMBSAME(inode.i_rwsem[w])'
     15.8% (88 out of 558 mem accesses): EMBOTHER(inode.i_rwsem[w])
    counterexample.sql.sh inode w:i_flags CEX SEQ
'EMBOTHER(inode.i_rwsem[w])'
!    99.8% (557 out of 558 mem accesses): EMBSAME(inode.i_rwsem[w])
!   counterexample.sql.sh inode w:i_flags CEX SEQ
'EMBSAME(inode.i_rwsem[w])'
      100% (558 out of 558 mem accesses): (no locks held)
    (no counterexamples to be expected, this hypothesis has 100% support
in the observation set)

In this example LockDoc concludes that the lock
"EMBSAME(inode.i_rwsem[w])" is necessary for writing inode.i_flags.
EMBSAME stands for the lock embedded in the inode being accessed. In
this case it is the i_rwsem.
To be more precise, the write lock (--> "[w]") of i_rwsem is needed.
Based on this methodology, we can determine code locations that do not
adhere to the deduced locking rules.
The reports on rule-violating code include the stack trace and the
actual locks held.

We've now created a series of bug reports for the following data types:
struct inode (used by ext4), journal_t, and transaction_t.
We present the counterexamples for each tuple of (data structure,
member, {r,w}).
Depending on the complexity of the callgraph, the counterexamples are
either embedded in the callgraph or the callgraph is shown below them.
In the latter case, zooming can be enabled via a button in the heading.

We kindly ask you to have a look at our findings and send us some
comments back:
https://ess.cs.tu-dortmund.de/lockdoc-bugs/ml/

Our approach has already revealed one real bug and one suspicious
situation. Both have been confirmed by Jan.

Best regards,
Alex and Horst

-- 
Technische Universität Dortmund
Alexander Lochmann                PGP key: 0xBC3EF6FD
Otto-Hahn-Str. 16                 phone:  +49.231.7556141
D-44227 Dortmund                  fax:    +49.231.7556116
http://ess.cs.tu-dortmund.de/Staff/al

Attachment:
signature.asc

Description: OpenPGP digital signature