On Wed 17-07-24 17:55:23, Vlastimil Babka (SUSE) wrote: > Hi, > > you should have Ccd people according to get_maintainers script to get a > reply faster. Let me Cc the MEMCG section. > > On 7/10/24 3:07 AM, Qu Wenruo wrote: > > Recently I'm hitting soft lockup if adding an order 2 folio to a > > filemap using GFP_NOFS | __GFP_NOFAIL. The softlockup happens at memcg > > charge code, and I guess that's exactly what __GFP_NOFAIL is expected to > > do, wait indefinitely until the request can be met. > > Seems like a bug to me, as the charging of __GFP_NOFAIL in > try_charge_memcg() should proceed to the force: part AFAICS and just go over > the limit. > > I was suspecting mem_cgroup_oom() a bit earlier return true, causing the > retry loop, due to GFP_NOFS. But it seems out_of_memory() should be > specifically proceeding for GFP_NOFS if it's memcg oom. But I might be > missing something else. Anyway we should know what exactly is going first. Correct. memcg oom code will invoke the memcg OOM killer for NOFS requests. See out_of_memory /* * The OOM killer does not compensate for IO-less reclaim. * But mem_cgroup_oom() has to invoke the OOM killer even * if it is a GFP_NOFS allocation. */ if (!(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc)) return true; That means that there will be a victim killed, charges reclaimed and forward progress made. If there is no victim then the charging path will bail out and overcharge. Also the reclaim should have cond_rescheds in the reclaim path. If that is not sufficient it should be fixed rather than workaround. -- Michal Hocko SUSE Labs