On Wed, May 18, 2016 at 08:35:49AM +1000, Dave Chinner wrote: > On Tue, May 17, 2016 at 04:49:12PM +0200, Peter Zijlstra wrote: > > On Tue, May 17, 2016 at 09:10:56AM +1000, Dave Chinner wrote: > > > > > The reason we don't have lock clases for the ilock is that we aren't > > > supposed to call memory reclaim with that lock held in exclusive > > > mode. This is because reclaim can run transactions, and that may > > > need to flush dirty inodes to make progress. Flushing dirty inode > > > requires taking the ilock in shared mode. > > > > > > In the code path that was reported, we hold the ilock in /shared/ > > > mode with no transaction context (we are doing a read-only > > > operation). This means we can run transactions in memory reclaim > > > because a) we can't deadlock on the inode we hold locks on, and b) > > > transaction reservations will be able to make progress as we don't > > > hold any locks it can block on. > > > > Just to clarify; I read the above as that we cannot block on recursive > > shared locks, is this correct? > > > > Because we can in fact block on down_read()+down_read() just fine, so if > > you're assuming that, then something's busted. > > The transaction reservation path will run down_read_trylock() on the > inode, not down_read(). Hence if there are no pending writers, it > will happily take the lock twice and make progress, otherwise it > will skip the inode and there's no deadlock. If there's a pending > writer, then we have another context that is already in a > transaction context and has already pushed the item, hence it is > only in the scope of the current push because IO hasn't completed > yet and removed it from the list. > > > Otherwise, I'm not quite reading it right, which is, given the > > complexity of that stuff, entirely possible. > > There's a maze of dark, grue-filled twisty passages here... OK; I might need a bit more again. So now the code does something like: down_read(&i_lock); -- lockdep marks lock as held kmalloc(GFP_KERNEL); -- lockdep marks held locks as ENABLED_RECLAIM_FS --> reclaim() down_read_trylock(&i_lock); -- lockdep does _NOT_ mark as USED_IN_RECLAIM_FS Right? My 'problem' is that lockdep doesn't consider a trylock for the USED_IN annotation, so the i_lock class will only get the ENABLED tag but not get the USED_IN tag, and therefore _should_ not trigger the inversion. So what exactly is triggering the inversion? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>