Re: [PATCH v2 1/2] SUNRPC: Fix memory reclaim deadlocks in rpciod

NeilBrown <neilb@xxxxxxx> · Wed, 27 Aug 2014 11:43:23 +1000

On Tue, 26 Aug 2014 19:19:38 -0400 Johannes Weiner <hannes@xxxxxxxxxxx> wrote:

> On Tue, Aug 26, 2014 at 02:26:24PM +0100, Mel Gorman wrote:

> > It'd be nice of the memcg people could comment on whether they plan to
> > handle the fact that memcg is the only called of wait_on_page_writeback
> > in direct reclaim paths.
> 
> wait_on_page_writeback() is a hammer, and we need to be better about
> this once we have per-memcg dirty writeback and throttling, but I
> think that really misses the point.  Even if memcg writeback waiting
> were smarter, any length of time spent waiting for yourself to make
> progress is absurd.  We just shouldn't be solving deadlock scenarios
> through arbitrary timeouts on one side.  If you can't wait for IO to
> finish, you shouldn't be passing __GFP_IO.

I think that is overly simplistic.  Certainly "waiting for yourself" is
absurd, but it can be hard to know if that is what you are doing.
Refusing to wait at all just because you might be waiting for yourself is
also absurd.
Direct reclaim already has "congestion_wait()" calls which wait a little
while, just in case.  Doing that you find a page in writeback might not be
such a bad thing.

When this becomes an issue, writeout is already slowing everything down, and
maybe slowing down a bit more isn't much cost.

> 
> Can't you use mempools like the other IO paths?

mempools and other pre-allocation strategies are appropriate for block
devices and critical for any "swap out" path.
Filesystems have traditionally got by without them, using GFP_NOFS when
necessary.

GFP_NOFS was originally meant to be set when holding filesystem-internal
locks.  Setting it everywhere that memory might be allocated while handing
write-out is a very different use-case.

Setting GFP_NOFS in more and more places doesn't really scale very well and
is particularly awkward for NFS as lots of network interfaces don't allow
setting GFP flags, and the network maintainers really don't want them to.

The recent direct-reclaim changes to get kswapd and the flush- threads to do
most of the work made it much easier to avoid deadlocks.  Direct reclaim no
longer calls ->writepage and doesn't wait_on_page_writeback().  Except when
handling memory pressure for a memcg.

It's not an easy problem, but I don't think that "use mempools" is a valid
answer.  A simple rule like "direct reclaim never blocks indefinitely" is, I
think, quite achievable and would resolve a whole class of deadlocks.

Thanks,
NeilBrown

Attachment:
signature.asc

Description: PGP signature