On Thu 18-07-24 10:09:31, Michal Hocko wrote: > On Thu 18-07-24 17:27:05, Qu Wenruo wrote: > > > > > > 在 2024/7/18 16:55, Michal Hocko 写道: > > > On Thu 18-07-24 09:17:42, Vlastimil Babka (SUSE) wrote: > > > > On 7/18/24 12:38 AM, Qu Wenruo wrote: > > > [...] > > > > > Does the folio order has anything related to the problem or just a > > > > > higher order makes it more possible? > > > > > > > > I didn't spot anything in the memcg charge path that would depend on the > > > > order directly, hm. Also what kernel version was showing these soft lockups? > > > > > > Correct. Order just defines the number of charges to be reclaimed. > > > Unlike the page allocator path we do not have any specific requirements > > > on the memory to be released. > > > > So I guess the higher folio order just brings more pressure to trigger the > > problem? > > It increases the reclaim target (in number of pages to reclaim). That > might contribute but we are cond_resched-ing in shrink_node_memcgs and > also down the path in shrink_lruvec etc. So higher target shouldn't > cause soft lockups unless we have a bug there - e.g. not triggering any > of those paths with empty LRUs and looping somewhere. Not sure about > MGLRU state of things TBH. > > > > > > And finally, even without the hang problem, does it make any sense to > > > > > skip all the possible memcg charge completely, either to reduce latency > > > > > or just to reduce GFP_NOFAIL usage, for those user inaccessible inodes? > > > > > > Let me just add to the pile of questions. Who does own this memory? > > > > A special inode inside btrfs, we call it btree_inode, which is not > > accessible out of the btrfs module, and its lifespan is the same as the > > mounted btrfs filesystem. > > But the memory charge is attributed to the caller unless you tell > otherwise. So if this is really an internal use and you use a shared > infrastructure which expects the current task to be owner of the charged > memory then you need to wrap the initialization into set_active_memcg > scope. hit send too quickly, meant to finish with ... and use root cgroup. -- Michal Hocko SUSE Labs