On Mon 02-09-24 11:02:50, Yafang Shao wrote: > On Sun, Sep 1, 2024 at 11:35 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote: [...] > > AIUI, the memory allocation looping has back-offs already built in > > to it when memory reserves are exhausted and/or reclaim is > > congested. > > > > e.g: > > > > get_page_from_freelist() > > (zone below watermark) > > node_reclaim() > > __node_reclaim() > > shrink_node() > > reclaim_throttle() > > It applies to all kinds of allocations. > > > > > And the call to recalim_throttle() will do the equivalent of > > memalloc_retry_wait() (a 2ms sleep). > > I'm wondering if we should take special action for __GFP_NOFAIL, as > currently, it only results in an endless loop with no intervention. If the memory allocator/reclaim is trashing on couple of remaining pages that are easy to drop and reallocated again then the same endless loop is de-facto the behavior for _all_ non-costly allocations. All of them will loop. This is not really great but so far we haven't really developed a reliable thrashing detection that would suit all potential workloads. There are some that simply benefit from work not being lost even if the cost is a severe performance penalty. A general conclusion has been that workloads which would rather see OOM killer triggering early should implement that policy in the userspace. We have PSI, refault counters and other tools that could be used to detect pathological patterns and trigger workload specific action. I really do not see why GFP_NOFAIL should be any special in this specific case. -- Michal Hocko SUSE Labs