Am 28.11.18 um 21:34 schrieb Andreas Dilger: > On Nov 27, 2018, at 3:19 PM, Alexander Lochmann <alexander.lochmann@xxxxxxxxxxxxxx> wrote: >> >> Hi folks, >> >> during the past months we've been developing LockDoc, a trace-based >> approach of Lock Analysis in the Linux Kernel. >> LockDoc uses execution traces of an instrumented Linux Kernel to >> automatically deduce >> locking rules for all members of arbitrary kernel data structures. >> The traces are gathered running a manually selected fs-specific subset >> of the Linux Test Project in a virtual machine. >> These locking rules can be used to generate a comprehensive locking >> documentation and to reveal potential bugs. > > > This is quite interesting, and looks useful provided that there is a > workload that exercises the various codepaths. How long does such an > analysis take to run, and is there a plan to make this functionality > available to developers (e.g. either as a web portal, or to be able > to download the code and run it locally)? It takes ~34 min to gather the trace, and another 4,5h to process it. Yes, we intend to release our project. However, we do not have a schedule yet. > >> LockDoc generates rules for each tuple of (data structure, member, {r,w}). >> It completely ignores any element of type atomic{,64,long}_t as well as >> atomic_*() functions. >> Accesses during initialization and destruction of objects are also ignored. >> The output of LockDoc looks like this: >> inode member: i_flags [w] (15 lock combinations) >> hypotheses: 96 >> 15.8% (88 out of 558 mem accesses): EMBOTHER(inode.i_rwsem[w]) -> >> EMBSAME(inode.i_rwsem[w]) >> counterexample.sql.sh inode w:i_flags CEX SEQ >> 'EMBOTHER(inode.i_rwsem[w])' 'EMBSAME(inode.i_rwsem[w])' >> 15.8% (88 out of 558 mem accesses): EMBOTHER(inode.i_rwsem[w]) >> counterexample.sql.sh inode w:i_flags CEX SEQ >> 'EMBOTHER(inode.i_rwsem[w])' >> ! 99.8% (557 out of 558 mem accesses): EMBSAME(inode.i_rwsem[w]) >> ! counterexample.sql.sh inode w:i_flags CEX SEQ >> 'EMBSAME(inode.i_rwsem[w])' >> 100% (558 out of 558 mem accesses): (no locks held) >> (no counterexamples to be expected, this hypothesis has 100% support >> in the observation set) >> >> In this example LockDoc concludes that the lock >> "EMBSAME(inode.i_rwsem[w])" is necessary for writing inode.i_flags. >> EMBSAME stands for the lock embedded in the inode being accessed. In >> this case it is the i_rwsem. >> To be more precise, the write lock (--> "[w]") of i_rwsem is needed. >> Based on this methodology, we can determine code locations that do not >> adhere to the deduced locking rules. >> The reports on rule-violating code include the stack trace and the >> actual locks held. > > Looking at the page, it isn't very clear where some of the callpaths go. > For example, in the "writing inode:ext4.i_nlink-__i_nlink" case, it > shows a callpath from vfs_symlink() calling drop_nlink(), but this is not > called directly from vfs_symlink(). It is actually going through > dir->i_op->symlink() (ext4_symlink() in our case), so it would be best > to show that dependency. Thanks for pointing us to a GCC bug. :) Since drop_nlink() is a leaf function, GCC 8.2 does not generate the frame pointer code - even with CONFIG_FRAME_POINTER set. Using GCC 7.3 "fixes" this. (Disable -regparm=3 would also "fix" this issue with GCC 8.2. Since this concerns the kernel abi, this is not an option...) > > One minor bug in ext4_symlink() is that it should probably be calling > clear_nlink() to make it clear the link count should be zero, instead > of drop_nlink(), because we don't really want to trigger "remove_count" > and because the later part of this function is using set_nlink(inode, 1) > instead of inc_nlink() that decrements "remove_count" again. This does > not solve the reported warning, however. > > In ext4_orphan_add(), called immediately after drop_nlink(), it checks: > > WARN_ON_ONCE(!(inode->i_state & (I_NEW | I_FREEING)) && > !inode_is_locked(inode)); > > so this is the case where i_state has I_NEW set. The question is whether > it is worthwhile to grab inode_lock() in this code, just to keep the lock > checker happy, or whether the code can be annotated to tell the checker > that I_NEW means i_rwsem is not needed. Who can help us here? But it basically is a bug? Cheers, Alex > > Cheers, Andreas > >> >> We've now created a series of bug reports for the following data types: >> struct inode (used by ext4), journal_t, and transaction_t. >> We present the counterexamples for each tuple of (data structure, >> member, {r,w}). >> Depending on the complexity of the callgraph, the counterexamples are >> either embedded in the callgraph or the callgraph is shown below them. >> In the latter case, zooming can be enabled via a button in the heading. >> >> We kindly ask you to have a look at our findings and send us some >> comments back: >> https://ess.cs.tu-dortmund.de/lockdoc-bugs/ml/ >> >> Our approach has already revealed one real bug and one suspicious >> situation. Both have been confirmed by Jan. >> >> Best regards, >> Alex and Horst >> >> -- >> Technische Universität Dortmund >> Alexander Lochmann PGP key: 0xBC3EF6FD >> Otto-Hahn-Str. 16 phone: +49.231.7556141 >> D-44227 Dortmund fax: +49.231.7556116 >> http://ess.cs.tu-dortmund.de/Staff/al >> > > > Cheers, Andreas > > > > > -- Technische Universität Dortmund Alexander Lochmann PGP key: 0xBC3EF6FD Otto-Hahn-Str. 16 phone: +49.231.7556141 D-44227 Dortmund fax: +49.231.7556116 http://ess.cs.tu-dortmund.de/Staff/al
Attachment:
signature.asc
Description: OpenPGP digital signature