Re: [PATCH v2 1/2] SUNRPC: Fix memory reclaim deadlocks in rpciod

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 10 Sep 2014 15:48:43 +0200 Michal Hocko <mhocko@xxxxxxx> wrote:

> On Tue 09-09-14 12:33:46, Neil Brown wrote:
> > On Thu, 4 Sep 2014 15:54:27 +0200 Michal Hocko <mhocko@xxxxxxx> wrote:
> > 
> > > [Sorry for jumping in so late - I've been busy last days]
> > > 
> > > On Wed 27-08-14 16:36:44, Mel Gorman wrote:
> > > > On Tue, Aug 26, 2014 at 08:00:20PM -0400, Trond Myklebust wrote:
> > > > > On Tue, Aug 26, 2014 at 7:51 PM, Trond Myklebust
> > > > > <trond.myklebust@xxxxxxxxxxxxxxx> wrote:
> > > > > > On Tue, Aug 26, 2014 at 7:19 PM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> > > [...]
> > > > > >> wait_on_page_writeback() is a hammer, and we need to be better about
> > > > > >> this once we have per-memcg dirty writeback and throttling, but I
> > > > > >> think that really misses the point.  Even if memcg writeback waiting
> > > > > >> were smarter, any length of time spent waiting for yourself to make
> > > > > >> progress is absurd.  We just shouldn't be solving deadlock scenarios
> > > > > >> through arbitrary timeouts on one side.  If you can't wait for IO to
> > > > > >> finish, you shouldn't be passing __GFP_IO.
> > > 
> > > Exactly!
> > 
> > This is overly simplistic.
> > The code that cannot wait may be further up the call chain and not in a
> > position to avoid passing __GFP_IO.
> > In many case it isn't that "you can't wait for IO" in general, but that you
> > cannot wait for one specific IO request.
> 
> Could you be more specific, please? Why would a particular IO make any
> difference to general IO from the same path? My understanding was that
> once the page is marked PG_writeback then it is about to be written to
> its destination and if there is any need for memory allocation it should
> better not allow IO from reclaim.

The more complex the filesystem, the harder it is to "not allow IO from
reclaim".
For NFS (which started this thread) there might be a need to open a new
connection - so allocating in the networking code would all need to be
careful.
And it isn't impossible that a 'gss' credential needs to be re-negotiated,
and that might even need user-space interaction (not sure of details).

What you say certainly used to be the case, and very often still is.  But it
doesn't really scale with complexity of filesystems.

I don't think there is (yet) any need to optimised for allocations that don't
disallow IO happening in the writeout path.  But I do think waiting
indefinitely for a particular IO is unjustifiable.

> 
> > wait_on_page_writeback() waits for a specific IO and so is dangerous.
> > congestion_wait() or similar waits for IO in general and so is much safer.
> 
> congestion_wait was actually not sufficient to prevent from OOM with
> heavy writer in a small memcg. We simply do not know how long will the
> IO last so any "wait for a random timeout" will end up causing some
> troubles.

I certainly accept that "congestion_wait" isn't a sufficient solution.
The thing I like about it is that it combines a timeout with a measure of
activity.
As long as writebacks are completing, it is reasonable to
wait_on_page_writeback().  But if no writebacks have completed for a while,
then it seems pointless waiting on this page any more.  Best to try to make
forward progress with whatever memory you can find.

NeilBrown

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux