Re: [RFC] The many faces of the I_LOCK

David Chinner <dgc@xxxxxxx> · Thu, 22 Feb 2007 05:29:04 +1100

On Wed, Feb 21, 2007 at 01:09:56PM +0000, JÃ¶rn Engel wrote:
> 1. Introduction
> 
> This lengthy investigation was caused by a deadlock problem in LogFS,
> but uncovered a more general problem.  It affects, at the least, all
> filesystems that need to read inodes in their write path.  To my
> knowledge, that includes LogFS and NTFS, possibly also JFS and XFS.

I don't think XFS has any problems here - we're pretty careful about
reading inodes from disk before we lock other potentially dependent
objects in the filesystem....

> Deadlock happens when two processes A and B (that may be identical) have
> these two call chains:
> 
> Process A:				Process B:
> inode_wait				[filesystem locking write path]
> __wait_on_bit				__writeback_single_inode
> out_of_line_wait_on_bit
> ifind_fast
> [filesystem calling iget()]
> [filesystem locking write path]
> 
> Process A will wait_on_inode() and block until process B proceeds
> through __sync_single_inode(), which is called from
> __writeback_single_inode() in Process B.  Process B will block on the
> lock of the filesystem write path, held by Process A.

This is caused by your cleaner thread racing with writeback doing inode
lookup, right? You need a non-blocking inode lookup to prevent the deadlock,
I guess....

> 2. The usage of inode_lock and I_LOCK
.....
> Three rows have not exposed their meaning to me yet, so I'd
> gladly receive some insight here.

IIRC, the checks in xfs_ichgtime[_fast] are simply an optimisation - if the
inode is currently I_LOCKed then we can't mark it dirty anyway so we don't
even bother trying. We do mark the XFS inode structure (*) dirty, though, so
the modification will make it to disk at some time in the future.

(*) XFS does double inode caching because the XFS transaction subsystem
requires inodes to have a different lifecycle to the linux struct inode
lifecycle.

> 3. Seperating out sync notification
> 
> One of the results from the investigations in 2 appears to be that one
> class of users in fs/fs-writeback.c is completely unrelated to another
> class of users in fs/inode.c.
> 
> In particular, __sync_single_inode(), __writeback_single_inode(),
> write_inode_now(), clear_inode(), __mark_inode_dirty() and (possibly?)
> generic_osync_inode() seem to only need a completion event to
> synchronize with.  There is no reason why this group should share a lock
> with iget() and any of its many permutations.

Seems reasonable, but I don't know all the little details in these
paths....

> Now, if the group in fs/fs-writeback.c had a completion event that is
> independent of anything in fs/inode.c, the deadlock scenario described
> in section 1 goes away.  As a further result, ilookup5_nowait() can get
> removed as its only user, NTFS, only introduced it as a workaround for
> the deadlock scenario.

Could you use ilookup5_nowait() in LogFS and avoid the cleaner deadlock
that way?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html