On Wed, Aug 25, 2010 at 02:48:36PM +0200, Peter Zijlstra wrote: > On Wed, 2010-08-25 at 07:57 -0400, Ted Ts'o wrote: > > On Wed, Aug 25, 2010 at 01:35:32PM +0200, Peter Zijlstra wrote: > > > On Wed, 2010-08-25 at 07:24 -0400, Ted Ts'o wrote: > > > > Part of the problem is that we have a few places in the kernel where > > > > failure is really not an option --- or rather, if we're going to fail > > > > while we're in the middle of doing a commit, our choices really are > > > > (a) retry the loop in the jbd layer (which Andrew really doesn't > > > > like), (b) keep our own private cache of free memory so we don't fail > > > > and/or loop, (c) fail the file system and mark it read-only, or (d) > > > > panic. > > > > > > d) do the allocation before you're committed to going fwd and can still > > > fail and back out. > > > > Sure in some cases that can be done, but the commit has to happen at > > some point, or we run out of journal space, at which point we're back > > to (c) or (d). > > Well (b) sounds a lot saner than either of those. Simply revert to a > state that is sub-optimal but has bounded memory use and reserve that > memory up-front. That way you can always get out of a tight memory spot. > > Its what the block layer has always done to avoid the memory deadlock > situation, it has a private stash of BIOs that is big enough to always > service some IO, and as long as IO is happening stuff keeps moving fwd > and we don't deadlock. > > Filesystems might have a slightly harder time creating such a bounded > state because there might be more involved like journals and the like, > but still it should be possible to create something like that (my swap > over nfs patches created such a state for the network rx side of > things). Filesystems are way more complex than the block layer - the block layer simply doesn't have to handle situations were thread X is holding A, B and C, while thread Y needs C to complete the transaction. thread Y is the user of the low memory pool, but has almost depleted it and so even if we swith to thread X, the pool doe snot have enouhg memory for X to complete and allow us to switch back to Y and have it complete, freeing the memory from the pool that it holds. That is, the guarantee that we will always make progress simply does not exist in filesystems, so a mempool-like concept seems to me to be doomed from the start.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html