Re: [PATCH] bcachefs: Switch to memalloc_flags_do() for vmalloc allocations

Michal Hocko <mhocko@xxxxxxxx> · Mon, 2 Sep 2024 10:11:43 +0200

On Mon 02-09-24 11:02:50, Yafang Shao wrote:
> On Sun, Sep 1, 2024 at 11:35 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
[...]
> > AIUI, the memory allocation looping has back-offs already built in
> > to it when memory reserves are exhausted and/or reclaim is
> > congested.
> >
> > e.g:
> >
> > get_page_from_freelist()
> >   (zone below watermark)
> >   node_reclaim()
> >     __node_reclaim()
> >       shrink_node()
> >         reclaim_throttle()
> 
> It applies to all kinds of allocations.
> 
> >
> > And the call to recalim_throttle() will do the equivalent of
> > memalloc_retry_wait() (a 2ms sleep).
> 
> I'm wondering if we should take special action for __GFP_NOFAIL, as
> currently, it only results in an endless loop with no intervention.

If the memory allocator/reclaim is trashing on couple of remaining pages
that are easy to drop and reallocated again then the same endless loop
is de-facto the behavior for _all_ non-costly allocations. All of them
will loop. This is not really great but so far we haven't really
developed a reliable thrashing detection that would suit all potential
workloads. There are some that simply benefit from work not being lost
even if the cost is a severe performance penalty. A general conclusion
has been that workloads which would rather see OOM killer triggering
early should implement that policy in the userspace. We have PSI,
refault counters and other tools that could be used to detect
pathological patterns and trigger workload specific action.

I really do not see why GFP_NOFAIL should be any special in this
specific case. 
-- 
Michal Hocko
SUSE Labs