On Nov 27, 2018, at 3:19 PM, Alexander Lochmann <alexander.lochmann@xxxxxxxxxxxxxx> wrote: > > Hi folks, > > during the past months we've been developing LockDoc, a trace-based > approach of Lock Analysis in the Linux Kernel. > LockDoc uses execution traces of an instrumented Linux Kernel to > automatically deduce > locking rules for all members of arbitrary kernel data structures. > The traces are gathered running a manually selected fs-specific subset > of the Linux Test Project in a virtual machine. > These locking rules can be used to generate a comprehensive locking > documentation and to reveal potential bugs. This is quite interesting, and looks useful provided that there is a workload that exercises the various codepaths. How long does such an analysis take to run, and is there a plan to make this functionality available to developers (e.g. either as a web portal, or to be able to download the code and run it locally)? > LockDoc generates rules for each tuple of (data structure, member, {r,w}). > It completely ignores any element of type atomic{,64,long}_t as well as > atomic_*() functions. > Accesses during initialization and destruction of objects are also ignored. > The output of LockDoc looks like this: > inode member: i_flags [w] (15 lock combinations) > hypotheses: 96 > 15.8% (88 out of 558 mem accesses): EMBOTHER(inode.i_rwsem[w]) -> > EMBSAME(inode.i_rwsem[w]) > counterexample.sql.sh inode w:i_flags CEX SEQ > 'EMBOTHER(inode.i_rwsem[w])' 'EMBSAME(inode.i_rwsem[w])' > 15.8% (88 out of 558 mem accesses): EMBOTHER(inode.i_rwsem[w]) > counterexample.sql.sh inode w:i_flags CEX SEQ > 'EMBOTHER(inode.i_rwsem[w])' > ! 99.8% (557 out of 558 mem accesses): EMBSAME(inode.i_rwsem[w]) > ! counterexample.sql.sh inode w:i_flags CEX SEQ > 'EMBSAME(inode.i_rwsem[w])' > 100% (558 out of 558 mem accesses): (no locks held) > (no counterexamples to be expected, this hypothesis has 100% support > in the observation set) > > In this example LockDoc concludes that the lock > "EMBSAME(inode.i_rwsem[w])" is necessary for writing inode.i_flags. > EMBSAME stands for the lock embedded in the inode being accessed. In > this case it is the i_rwsem. > To be more precise, the write lock (--> "[w]") of i_rwsem is needed. > Based on this methodology, we can determine code locations that do not > adhere to the deduced locking rules. > The reports on rule-violating code include the stack trace and the > actual locks held. Looking at the page, it isn't very clear where some of the callpaths go. For example, in the "writing inode:ext4.i_nlink-__i_nlink" case, it shows a callpath from vfs_symlink() calling drop_nlink(), but this is not called directly from vfs_symlink(). It is actually going through dir->i_op->symlink() (ext4_symlink() in our case), so it would be best to show that dependency. One minor bug in ext4_symlink() is that it should probably be calling clear_nlink() to make it clear the link count should be zero, instead of drop_nlink(), because we don't really want to trigger "remove_count" and because the later part of this function is using set_nlink(inode, 1) instead of inc_nlink() that decrements "remove_count" again. This does not solve the reported warning, however. In ext4_orphan_add(), called immediately after drop_nlink(), it checks: WARN_ON_ONCE(!(inode->i_state & (I_NEW | I_FREEING)) && !inode_is_locked(inode)); so this is the case where i_state has I_NEW set. The question is whether it is worthwhile to grab inode_lock() in this code, just to keep the lock checker happy, or whether the code can be annotated to tell the checker that I_NEW means i_rwsem is not needed. Cheers, Andreas > > We've now created a series of bug reports for the following data types: > struct inode (used by ext4), journal_t, and transaction_t. > We present the counterexamples for each tuple of (data structure, > member, {r,w}). > Depending on the complexity of the callgraph, the counterexamples are > either embedded in the callgraph or the callgraph is shown below them. > In the latter case, zooming can be enabled via a button in the heading. > > We kindly ask you to have a look at our findings and send us some > comments back: > https://ess.cs.tu-dortmund.de/lockdoc-bugs/ml/ > > Our approach has already revealed one real bug and one suspicious > situation. Both have been confirmed by Jan. > > Best regards, > Alex and Horst > > -- > Technische Universität Dortmund > Alexander Lochmann PGP key: 0xBC3EF6FD > Otto-Hahn-Str. 16 phone: +49.231.7556141 > D-44227 Dortmund fax: +49.231.7556116 > http://ess.cs.tu-dortmund.de/Staff/al > Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP