Re: [PATCH 0/2] mm: skip memcg for certain address space

Michal Hocko <mhocko@xxxxxxxx> · Thu, 18 Jul 2024 10:10:21 +0200



On Thu 18-07-24 10:09:31, Michal Hocko wrote:
> On Thu 18-07-24 17:27:05, Qu Wenruo wrote:
> > 
> > 
> > 在 2024/7/18 16:55, Michal Hocko 写道:
> > > On Thu 18-07-24 09:17:42, Vlastimil Babka (SUSE) wrote:
> > > > On 7/18/24 12:38 AM, Qu Wenruo wrote:
> > > [...]
> > > > > Does the folio order has anything related to the problem or just a
> > > > > higher order makes it more possible?
> > > > 
> > > > I didn't spot anything in the memcg charge path that would depend on the
> > > > order directly, hm. Also what kernel version was showing these soft lockups?
> > > 
> > > Correct. Order just defines the number of charges to be reclaimed.
> > > Unlike the page allocator path we do not have any specific requirements
> > > on the memory to be released.
> > 
> > So I guess the higher folio order just brings more pressure to trigger the
> > problem?
> 
> It increases the reclaim target (in number of pages to reclaim). That
> might contribute but we are cond_resched-ing in shrink_node_memcgs and
> also down the path in shrink_lruvec etc. So higher target shouldn't
> cause soft lockups unless we have a bug there - e.g. not triggering any
> of those paths with empty LRUs and looping somewhere. Not sure about
> MGLRU state of things TBH.
>  
> > > > > And finally, even without the hang problem, does it make any sense to
> > > > > skip all the possible memcg charge completely, either to reduce latency
> > > > > or just to reduce GFP_NOFAIL usage, for those user inaccessible inodes?
> > > 
> > > Let me just add to the pile of questions. Who does own this memory?
> > 
> > A special inode inside btrfs, we call it btree_inode, which is not
> > accessible out of the btrfs module, and its lifespan is the same as the
> > mounted btrfs filesystem.
> 
> But the memory charge is attributed to the caller unless you tell
> otherwise. So if this is really an internal use and you use a shared
> infrastructure which expects the current task to be owner of the charged
> memory then you need to wrap the initialization into set_active_memcg
> scope.

hit send too quickly, meant to finish with
... and use root cgroup.
-- 
Michal Hocko
SUSE Labs