Re: [PATCH 0/2] mm: skip memcg for certain address space

Qu Wenruo <quwenruo.btrfs@xxxxxxx> · Thu, 18 Jul 2024 08:08:29 +0930

在 2024/7/18 01:44, Michal Hocko 写道:
On Wed 17-07-24 17:55:23, Vlastimil Babka (SUSE) wrote:
Hi,

you should have Ccd people according to get_maintainers script to get a
reply faster. Let me Cc the MEMCG section.

On 7/10/24 3:07 AM, Qu Wenruo wrote:
Recently I'm hitting soft lockup if adding an order 2 folio to a
filemap using GFP_NOFS | __GFP_NOFAIL. The softlockup happens at memcg
charge code, and I guess that's exactly what __GFP_NOFAIL is expected to
do, wait indefinitely until the request can be met.

Seems like a bug to me, as the charging of __GFP_NOFAIL in
try_charge_memcg() should proceed to the force: part AFAICS and just go over
the limit.

I was suspecting mem_cgroup_oom() a bit earlier return true, causing the
retry loop, due to GFP_NOFS. But it seems out_of_memory() should be
specifically proceeding for GFP_NOFS if it's memcg oom. But I might be
missing something else. Anyway we should know what exactly is going first.

Correct. memcg oom code will invoke the memcg OOM killer for NOFS
requests. See out_of_memory

         /*
          * The OOM killer does not compensate for IO-less reclaim.
          * But mem_cgroup_oom() has to invoke the OOM killer even
          * if it is a GFP_NOFS allocation.
          */
         if (!(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc))
                 return true;

That means that there will be a victim killed, charges reclaimed and
forward progress made. If there is no victim then the charging path will
bail out and overcharge.

Also the reclaim should have cond_rescheds in the reclaim path. If that
is not sufficient it should be fixed rather than workaround.

Another question is, I only see this hang with larger folio (order 2 vs
the old order 0) when adding to the same address space.

Does the folio order has anything related to the problem or just a
higher order makes it more possible?

And finally, even without the hang problem, does it make any sense to
skip all the possible memcg charge completely, either to reduce latency
or just to reduce GFP_NOFAIL usage, for those user inaccessible inodes?

Thanks,
Qu