Re: [PATCH 1/1] mm/oom_kill: trigger the oom killer if oom occurs without __GFP_FS

Michal Hocko <mhocko@xxxxxxxx> · Wed, 3 May 2023 14:20:31 +0200

On Wed 03-05-23 19:49:19, Hui Wang wrote:
> 
> On 4/29/23 03:53, Michal Hocko wrote:
> > On Thu 27-04-23 11:47:10, Hui Wang wrote:
> > [...]
> > > So Michal,
> > > 
> > > Don't know if you read the "[PATCH 0/1] mm/oom_kill: system enters a state
> > > something like hang when running stress-ng", do you know why out_of_memory()
> > > will return immediately if there is no __GFP_FS, could we drop these lines
> > > directly:
> > > 
> > >      /*
> > >       * The OOM killer does not compensate for IO-less reclaim.
> > >       * pagefault_out_of_memory lost its gfp context so we have to
> > >       * make sure exclude 0 mask - all other users should have at least
> > >       * ___GFP_DIRECT_RECLAIM to get here. But mem_cgroup_oom() has to
> > >       * invoke the OOM killer even if it is a GFP_NOFS allocation.
> > >       */
> > >      if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc))
> > >          return true;
> > The comment is rather hard to grasp without an intimate knowledge of the
> > memory reclaim. The primary reason is that the allocation context
> > without __GFP_FS (and also __GFP_IO) cannot perform a full memory
> > reclaim because fs or the storage subsystem might be holding locks
> > required for the memory reclaim. This means that a large amount of
> > reclaimable memory is out of sight of the specific direct reclaim
> > context. If we allowed oom killer to trigger we could invoke the oom
> > killer while there is a lot of otherwise reclaimable memory. As you can
> > imagine not something many users would appreciate as the oom kill is a
> > very disruptive operation. In this case we rely on kswapd or other
> > GFP_KERNEL like allocation context to make forward instead. If there is
> > really nothing reclaimable then the oom killer would eventually hit from
> > elsewhere.
> > 
> > HTH
> Hi Michal,
> 
> Understand. Thanks for explanation. So we can't remove those 2 lines of
> code.
> 
> Here in my patch, letting a kthread allocate a page with GFP_KERNEL, It
> could possibly trigger the reclaim and if nothing reclaimable, trigger the
> oom killer. Do you think it is a safe workaround for the issue we are facing
> currently?

I have to say I really dislike this workaround. Allocating memory just
to release it and potentially hit the oom killer is really not very
mindful approach to the problem. It is not a reliable way either because
you depend on the WQ context which might be clogged for the very same
lack of memory. This issue simply doesn't have a simple and neat
solution unfortunately.

I would prefer if the fs could be less demanding from NOFS context if
that is possible at all.
-- 
Michal Hocko
SUSE Labs