Re: [PATCH 0/2] mm: skip memcg for certain address space

Michal Hocko <mhocko@xxxxxxxx> · Thu, 18 Jul 2024 11:25:46 +0200

On Thu 18-07-24 18:22:11, Qu Wenruo wrote:
> 
> 
> 在 2024/7/18 17:39, Michal Hocko 写道:
> > On Thu 18-07-24 17:27:05, Qu Wenruo wrote:
> > > 
> > > 
> > > 在 2024/7/18 16:55, Michal Hocko 写道:
> > > > On Thu 18-07-24 09:17:42, Vlastimil Babka (SUSE) wrote:
> > > > > On 7/18/24 12:38 AM, Qu Wenruo wrote:
> > > > [...]
> > > > > > Does the folio order has anything related to the problem or just a
> > > > > > higher order makes it more possible?
> > > > > 
> > > > > I didn't spot anything in the memcg charge path that would depend on the
> > > > > order directly, hm. Also what kernel version was showing these soft lockups?
> > > > 
> > > > Correct. Order just defines the number of charges to be reclaimed.
> > > > Unlike the page allocator path we do not have any specific requirements
> > > > on the memory to be released.
> > > 
> > > So I guess the higher folio order just brings more pressure to trigger the
> > > problem?
> > 
> > It increases the reclaim target (in number of pages to reclaim). That
> > might contribute but we are cond_resched-ing in shrink_node_memcgs and
> > also down the path in shrink_lruvec etc. So higher target shouldn't
> > cause soft lockups unless we have a bug there - e.g. not triggering any
> > of those paths with empty LRUs and looping somewhere. Not sure about
> > MGLRU state of things TBH.
> > > > > > And finally, even without the hang problem, does it make any sense to
> > > > > > skip all the possible memcg charge completely, either to reduce latency
> > > > > > or just to reduce GFP_NOFAIL usage, for those user inaccessible inodes?
> > > > 
> > > > Let me just add to the pile of questions. Who does own this memory?
> > > 
> > > A special inode inside btrfs, we call it btree_inode, which is not
> > > accessible out of the btrfs module, and its lifespan is the same as the
> > > mounted btrfs filesystem.
> > 
> > But the memory charge is attributed to the caller unless you tell
> > otherwise.
> 
> By the caller, did you mean the user space program who triggered the
> filesystem operations?

Yes, the current task while these operations are done.

[...]
> > So if this is really an internal use and you use a shared
> > infrastructure which expects the current task to be owner of the charged
> > memory then you need to wrap the initialization into set_active_memcg
> > scope.
> > 
> 
> And for root cgroup I guess it means we will have no memory limits or
> whatever, and filemap_add_folio() should always success (except real -ENOMEM
> situations or -EEXIST error btrfs would handle)?

Yes. try_charge will bypass charging altogether for root cgroup. You
will likely need to ifdef root_mem_cgroup usage by CONFIG_MEMCG.

-- 
Michal Hocko
SUSE Labs