Re: How to handle TIF_MEMDIE stalls?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 04, 2015 at 09:41:01PM +0900, Tetsuo Handa wrote:
> Dave Chinner wrote:
> > On Fri, Feb 27, 2015 at 09:42:55PM +0900, Tetsuo Handa wrote:
> > > If kswapd0 is blocked forever at e.g. mutex_lock() inside shrinker
> > > functions, who else can make forward progress?
> > 
> > You can't get into these filesystem shrinkers when you do GFP_NOIO
> > allocations, as the IO path does.
> > 
> > > Shouldn't we avoid calling functions which could potentially block for
> > > unpredictable duration (e.g. unkillable locks and/or completion) from
> > > shrinker functions?
> > 
> > No, because otherwise we can't throttle allocation and reclaim to
> > the rate at which IO can clean dirty objects. i.e. we do this for
> > the same reason we throttle page cache dirtying to the rate at which
> > we can clean dirty pages....
> 
> I'm misunderstanding something. The description for kswapd() function
> in mm/vmscan.c says "This basically trickles out pages so that we have
> _some_ free memory available even if there is no other activity that frees
> anything up".

Sure.

> Forever blocking kswapd0 somewhere inside filesystem shrinker functions is
> equivalent with removing kswapd() function because it also prevents non
> filesystem shrinker functions from being called by kswapd0, doesn't it?

Yes, but that's not intentional. Remember, we keep talking about the
filesystem not being able to guarantee forwards progress if
allocations block forever? Well...

> Then, the description will become "We won't have _some_ free memory available
> if there is no other activity that frees anything up", won't it?

... we've ended up blocking kswapd because it's waiting on a journal
commit to complete, and that journal commit is blocked waiting for
forwards progress in memory allocation...

Yes, it's another one of those nasty dependencies I keep pointing
out that filesystems have, and that can only be solved by
guaranteeing we can always make forwards allocation progress from
transaction reserve to transaction commit.

> Does kswapd0 exist only for reducing the delay caused by reclaiming
> synchronously? Disabling kswapd0 affects nothing about functionality?
> The system can make forward progress even if nobody can call non filesystem
> shrinkers, can't it?

The throttling is required to control the unbound parallelism of
direct reclaim. If we don't do this, inode cache reclaim causes
random inode writeback and thrashes the disks with random IO,
causing severe degradation in performance under heavy memory
pressure. So we throttle inode reclaim to a single thread per AG so
we get nice sequential IO patterns from inode cache reclaim - the
difference is that we can reclaim several hundred thousand dirty
inodes per second versus a few hundred...

And because memory allocation is bound by reclaim speed, we throttle
the direct reclaimers to prevent IO breakdown conditions from
occurring and hence keep performance under memory pressure
relatively high and mostly predictable.

It's rare that kswapd actually gets stuck like this - I've only ever
seen it once, and I've never had anyone running a production system
report deadlocks like this...

> I can't understand the difference between "kswapd0 sleeping forever at
> too_many_isolated() loop inside shrink_inactive_list()" and "kswapd0
> sleeping forever at mutex_lock() inside xfs_reclaim_inodes_ag()".

I don't really care.

The direct reclaim behaviour is a much bigger problem, and the risk
of occasionally having problems with kswapd is miniscule in
comparison. Sure, you can provoke it, but unless you are intentially
doing nasty things to production systems, it will never be a problem
that you trip over.

We can't solve every problem with the current memory
allcoatin/reclaim design - we've chosen the lesser evil here, and
we're going to have to live with it until we get a more robust
memory allocation subsystem implementation.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]