Re: [PATCH] writeback: Avoid exhausting allocation reserves under memory pressure

Michal Hocko <mhocko@xxxxxxxxxx> · Mon, 16 May 2016 13:45:28 +0200

On Thu 12-05-16 18:08:29, Jan Kara wrote:
> On Thu 05-05-16 14:37:51, Andrew Morton wrote:
[...]
> > bdi_split_work_to_wbs() does GFP_ATOMIC as well.  Problem?  (Why the
> > heck don't we document the *reasons* for these things, sigh).
> 
> Heh, there are much more GFP_ATOMIC allocations in fs/fs-writeback.c after
> Tejun's memcg aware writeback... I believe they are GFP_ATOMIC mostly
> because they can already be called from direct reclaim (e.g. when
> requesting pages to be written through wakeup_flusher_threads()) and so we
> don't want to recurse into direct reclaim code again.

If that is the case then __GFP_DIRECT_RECLAIM should be cleared rather
than GFP_ATOMIC abused.

> > I suspect it would be best to be proactive here and use some smarter
> > data structure.  It appears that all the wb_writeback_work fields
> > except sb can be squeezed into a single word so perhaps a radix-tree. 
> > Or hash them all together and use a chained array or something.  Maybe
> > fiddle at it for an hour or so, see how it's looking?  It's a lot of
> > fuss to avoid one problematic kmalloc(), sigh.
> > 
> > We really don't want there to be *any* pathological workload which
> > results in merging failures - if that's the case then someone will hit
> > it.  They'll experience the ooms (perhaps) and the search complexity
> > issues (for sure).
> 
> So the question is what is the desired outcome. After Tetsuo's patch
> "mm,writeback: Don't use memory reserves for wb_start_writeback" we will
> use GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN instead of GFP_ATOMIC in
> wb_start_writeback(). We can treat other places using GFP_ATOMIC in a
> similar way. So my thought was that this is enough to avoid exhaustion of
> reserves for writeback work items under memory pressure. And the merging of
> writeback works I proposed was more like an optimization to avoid
> unnecessary allocations. And in that case we can allow imperfection and
> possibly large lists of queued works in pathological cases - I agree we
> should not DoS the system by going through large linked lists in any case but
> that is easily avoided if we are fine with the fact that merging won't happen
> always when it could.

Yes I think this is acceptable.

> The question which is not clear to me is: Do we want to guard against
> malicious attacker that may be consuming memory through writeback works
> that are allocated via GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN? 
> If yes, then my patch needs further thought. Any opinions?

GFP_NOWAIT still kicks the kswapd so there is some reclaim activity
on the background. Sure if we can reduce the number of those requests
it would be better because we are losing natural throttling without
the direct reclaim. But I am not sure I can see how this would cause a
a major problem (slow down everybody - quite possible - but not DoS
AFAICS).
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html