On Thu, Oct 11, 2007 at 03:15:12PM +0100, Andrew Clayton wrote: > On Thu, 11 Oct 2007 11:01:39 +1000, David Chinner wrote: > > > So it's almost certainly pointing at an elevator or driver change, not an > > XFS change. > > heh, git bisect begs to differ :) > > 4c60658e0f4e253cf275f12b7c76bf128515a774 is first bad commit commit > 4c60658e0f4e253cf275f12b7c76bf128515a774 Author: David Chinner <dgc@xxxxxxx> > Date: Sat Nov 11 18:05:00 2006 +1100 > > [XFS] Prevent a deadlock when xfslogd unpins inodes. Oh, of course - I failed to notice the significance of this loop in your test: while [foo]; do touch fred rm fred done The inode allocator keeps reusing the same inode. If the transaction that did the unlink has not hit the disk before we allocate the inode again, we have to force the log to get the unlink transaction to disk to get the xfs inode unpinned (i.e. able to be modified in memory again). It's the log force I/O that's introducing the latency. If we don't force the log, then we have a possible use-after free of the linux inode because of a fundamental mismatch between the XFS inode life cycle and the linux inode life cycle. The use-after free only occurs on large machines under heavy, heavy metadata load to many disks and filesystems (requires enough traffic to overload an xfslogd) and is very difficult to reproduce (large machine, lots of disks and 20-30 hours MTTF). I'll have a look at other ways to solve this problem, but it took 6 months to find a solution to the race in the first place so don't hold your breath. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html