On Wed, Feb 21, 2007 at 01:09:56PM +0000, Jörn Engel wrote: > 1. Introduction > > This lengthy investigation was caused by a deadlock problem in LogFS, > but uncovered a more general problem. It affects, at the least, all > filesystems that need to read inodes in their write path. To my > knowledge, that includes LogFS and NTFS, possibly also JFS and XFS. I don't think XFS has any problems here - we're pretty careful about reading inodes from disk before we lock other potentially dependent objects in the filesystem.... > Deadlock happens when two processes A and B (that may be identical) have > these two call chains: > > Process A: Process B: > inode_wait [filesystem locking write path] > __wait_on_bit __writeback_single_inode > out_of_line_wait_on_bit > ifind_fast > [filesystem calling iget()] > [filesystem locking write path] > > Process A will wait_on_inode() and block until process B proceeds > through __sync_single_inode(), which is called from > __writeback_single_inode() in Process B. Process B will block on the > lock of the filesystem write path, held by Process A. This is caused by your cleaner thread racing with writeback doing inode lookup, right? You need a non-blocking inode lookup to prevent the deadlock, I guess.... > 2. The usage of inode_lock and I_LOCK ..... > Three rows have not exposed their meaning to me yet, so I'd > gladly receive some insight here. IIRC, the checks in xfs_ichgtime[_fast] are simply an optimisation - if the inode is currently I_LOCKed then we can't mark it dirty anyway so we don't even bother trying. We do mark the XFS inode structure (*) dirty, though, so the modification will make it to disk at some time in the future. (*) XFS does double inode caching because the XFS transaction subsystem requires inodes to have a different lifecycle to the linux struct inode lifecycle. > 3. Seperating out sync notification > > One of the results from the investigations in 2 appears to be that one > class of users in fs/fs-writeback.c is completely unrelated to another > class of users in fs/inode.c. > > In particular, __sync_single_inode(), __writeback_single_inode(), > write_inode_now(), clear_inode(), __mark_inode_dirty() and (possibly?) > generic_osync_inode() seem to only need a completion event to > synchronize with. There is no reason why this group should share a lock > with iget() and any of its many permutations. Seems reasonable, but I don't know all the little details in these paths.... > Now, if the group in fs/fs-writeback.c had a completion event that is > independent of anything in fs/inode.c, the deadlock scenario described > in section 1 goes away. As a further result, ilookup5_nowait() can get > removed as its only user, NTFS, only introduced it as a workaround for > the deadlock scenario. Could you use ilookup5_nowait() in LogFS and avoid the cleaner deadlock that way? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html