On Aug 15, 2009 15:57 -0400, Christoph Hellwig wrote: > On Fri, Aug 14, 2009 at 05:25:05PM +0200, Nick Piggin wrote: > > Now I think the main problem is having the filesystem block (and do IO > > in inode reclaim. The problem is that this doesn't get accounted well > > and penalizes a random allocator with a big latency spike caused by > > work generated from elsewhere. > > > > I think the best idea would be to avoid this. By design if possible, > > or by deferring the hard work to an asynchronous context. If the latter, > > then the fs would probably want to throttle creation of new work with > > queue size of the deferred work, but let's not get into those details. > > I don't really see a good way to avoid this. For any filesystem that > does some sort of preallocations we need to drop them in ->clear_inode. One of the problems I've seen in the past is that filesystem memory reclaim (in particular dentry/inode cleanup) cannot happen within filesystems due to potential deadlocks. This is particularly problematic when there is a lot of memory pressure from within the kernel and very little from userspace (e.g. updatedb or find). However, many/most inodes/dentries in the filesystem could be discarded quite easily and would not deadlock the system. I wonder if it makes sense to keep a mask in the inode that the filesystem could set that determines whether it is safe to clean up the inode even though __GFP_FS is not set? That would potentially allow e.g. shrink_icache_memory() to free a large number of "non-tricky" inodes if needed (e.g. ones without locks/preallocation/expensive cleanup). Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html