On Wed 03-05-23 19:49:19, Hui Wang wrote: > > On 4/29/23 03:53, Michal Hocko wrote: > > On Thu 27-04-23 11:47:10, Hui Wang wrote: > > [...] > > > So Michal, > > > > > > Don't know if you read the "[PATCH 0/1] mm/oom_kill: system enters a state > > > something like hang when running stress-ng", do you know why out_of_memory() > > > will return immediately if there is no __GFP_FS, could we drop these lines > > > directly: > > > > > > /* > > > * The OOM killer does not compensate for IO-less reclaim. > > > * pagefault_out_of_memory lost its gfp context so we have to > > > * make sure exclude 0 mask - all other users should have at least > > > * ___GFP_DIRECT_RECLAIM to get here. But mem_cgroup_oom() has to > > > * invoke the OOM killer even if it is a GFP_NOFS allocation. > > > */ > > > if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc)) > > > return true; > > The comment is rather hard to grasp without an intimate knowledge of the > > memory reclaim. The primary reason is that the allocation context > > without __GFP_FS (and also __GFP_IO) cannot perform a full memory > > reclaim because fs or the storage subsystem might be holding locks > > required for the memory reclaim. This means that a large amount of > > reclaimable memory is out of sight of the specific direct reclaim > > context. If we allowed oom killer to trigger we could invoke the oom > > killer while there is a lot of otherwise reclaimable memory. As you can > > imagine not something many users would appreciate as the oom kill is a > > very disruptive operation. In this case we rely on kswapd or other > > GFP_KERNEL like allocation context to make forward instead. If there is > > really nothing reclaimable then the oom killer would eventually hit from > > elsewhere. > > > > HTH > Hi Michal, > > Understand. Thanks for explanation. So we can't remove those 2 lines of > code. > > Here in my patch, letting a kthread allocate a page with GFP_KERNEL, It > could possibly trigger the reclaim and if nothing reclaimable, trigger the > oom killer. Do you think it is a safe workaround for the issue we are facing > currently? I have to say I really dislike this workaround. Allocating memory just to release it and potentially hit the oom killer is really not very mindful approach to the problem. It is not a reliable way either because you depend on the WQ context which might be clogged for the very same lack of memory. This issue simply doesn't have a simple and neat solution unfortunately. I would prefer if the fs could be less demanding from NOFS context if that is possible at all. -- Michal Hocko SUSE Labs