On Thu, Nov 17, 2011 at 11:29:15AM +1100, Dave Chinner wrote: > Actually, that is not true: we know that transaction reservation in > Io completion can hang systems. That's why we specifically shifted > unwritten extent conversion to it's own workqueue. > The condition that I saw most frequently was transaction reservation > blocking and preventing IO from completing and therefore marking > memory clean and freeable, whilst the unwritten extent conversion > transaction reservation was blocked because the tail of the log > could not be moved due to memory allocation for inode buffers to > write back the inodes pinning the tail of the log failing and hence > looping endlessly in the buffer cache. If we do transaction > reservations for every single extending IO completion during Io > completion, we are effectively making the deadlock possible for most > extending delayed allocation write IOs (i.e. the majority of write > IO in a typical system). That fact has changed: commit 80641dc66a2d6dfb22af4413227a92b8ab84c7bb Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Mon Oct 19 04:00:03 2009 +0000 xfs: I/O completion handlers must use NOFS allocations When completing I/O requests we must not allow the memory allocator to recurse into the filesystem, as we might deadlock on waiting for the I/O completion otherwise. The only thing currently allocating normal GFP_KERNEL memory is the allocation of the transaction structure for the unwritten extent conversion. Add a memflags argument to _xfs_trans_alloc to allow controlling the allocator behaviour. So unwritten extent conversion will never recurse back into the filesystem any more. If we still care about blocking on other things (and I've hit this code fairly hard over the last months) we can increase the per-cpu workqueue concurrency limit. Currently xfs limits each of the workqueues to one item, but the default (when passing 0) would be 512. With that we could probably even get away with one per-filesystem workqueue if we want, but I'd really prefer to get the size logging fix out to users ASAP instead of doing all the fine-tuning now. > > In addition to that implementing log reservations from ->writepage that > > are only released by a different thread requires a lot of work, > > That's trivial to do - there's nothing thread specific about a > transaction. All you need to do is ensure that everything is logged > before you pass the struct xfs_trans off to another thread. We > can do that simply by adding it to the ioend. The transaction set PF_FSTRANS in current->flags, which also includes saving and restoring the current flags in (I can't figure out exactly why we do the later). Without this we trivially deadlock - I tried to fix it, but it either didn't quite work or I ran into more deadlocks through. _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs