Re: [Bug 192981] New: page allocation stalls

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Fri, 17 Feb 2017 20:11:09 +0900

On 2017/02/17 7:21, Dave Chinner wrote:
> FWIW, the major problem with removing the blocking in inode reclaim
> is the ease with which you can then trigger the OOM killer from
> userspace.  The high level memory reclaim algorithms break down when
> there are hundreds of direct reclaim processes hammering on reclaim
> and reclaim stops making progress because it's skipping dirty
> objects.  Direct reclaim ends up insufficiently throttled, so rather
> than blocking it winds up reclaim priority and then declares OOM
> because reclaim runs out of retries before sufficient memory has
> been freed.
> 
> That, right now, looks to be an unsolvable problem without a major
> rework of direct reclaim.  I've pretty much given up on ever getting
> the unbound direct reclaim concurrency problem that is causing us
> these problems fixed, so we are left to handle it in the subsystem
> shrinkers as best we can. That leaves us with an unfortunate choice: 
> 
> 	a) throttle excessive concurrency in the shrinker to prevent
> 	   IO breakdown, thereby causing reclaim latency bubbles
> 	   under load but having a stable, reliable system; or
> 	b) optimise for minimal reclaim latency and risk userspace
> 	   memory demand triggering the OOM killer whenever there
> 	   are lots of dirty inodes in the system.
> 
> Quite frankly, there's only one choice we can make in this
> situation: reliability is always more important than performance.

Is it possible to get rid of direct reclaim and let allocating thread
wait on queue? I wished such change in context of __GFP_KILLABLE at
http://lkml.kernel.org/r/201702012049.BAG95379.VJFFOHMStLQFOO@xxxxxxxxxxxxxxxxxxx .

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html