On Thu 18-07-24 18:22:11, Qu Wenruo wrote: > > > 在 2024/7/18 17:39, Michal Hocko 写道: > > On Thu 18-07-24 17:27:05, Qu Wenruo wrote: > > > > > > > > > 在 2024/7/18 16:55, Michal Hocko 写道: > > > > On Thu 18-07-24 09:17:42, Vlastimil Babka (SUSE) wrote: > > > > > On 7/18/24 12:38 AM, Qu Wenruo wrote: > > > > [...] > > > > > > Does the folio order has anything related to the problem or just a > > > > > > higher order makes it more possible? > > > > > > > > > > I didn't spot anything in the memcg charge path that would depend on the > > > > > order directly, hm. Also what kernel version was showing these soft lockups? > > > > > > > > Correct. Order just defines the number of charges to be reclaimed. > > > > Unlike the page allocator path we do not have any specific requirements > > > > on the memory to be released. > > > > > > So I guess the higher folio order just brings more pressure to trigger the > > > problem? > > > > It increases the reclaim target (in number of pages to reclaim). That > > might contribute but we are cond_resched-ing in shrink_node_memcgs and > > also down the path in shrink_lruvec etc. So higher target shouldn't > > cause soft lockups unless we have a bug there - e.g. not triggering any > > of those paths with empty LRUs and looping somewhere. Not sure about > > MGLRU state of things TBH. > > > > > > And finally, even without the hang problem, does it make any sense to > > > > > > skip all the possible memcg charge completely, either to reduce latency > > > > > > or just to reduce GFP_NOFAIL usage, for those user inaccessible inodes? > > > > > > > > Let me just add to the pile of questions. Who does own this memory? > > > > > > A special inode inside btrfs, we call it btree_inode, which is not > > > accessible out of the btrfs module, and its lifespan is the same as the > > > mounted btrfs filesystem. > > > > But the memory charge is attributed to the caller unless you tell > > otherwise. > > By the caller, did you mean the user space program who triggered the > filesystem operations? Yes, the current task while these operations are done. [...] > > So if this is really an internal use and you use a shared > > infrastructure which expects the current task to be owner of the charged > > memory then you need to wrap the initialization into set_active_memcg > > scope. > > > > And for root cgroup I guess it means we will have no memory limits or > whatever, and filemap_add_folio() should always success (except real -ENOMEM > situations or -EEXIST error btrfs would handle)? Yes. try_charge will bypass charging altogether for root cgroup. You will likely need to ifdef root_mem_cgroup usage by CONFIG_MEMCG. -- Michal Hocko SUSE Labs