On Sat, Sep 28, 2024 at 02:15:56PM +0930, Qu Wenruo wrote: > [BACKGROUND] > The function filemap_add_folio() charges the memory cgroup, > as we assume all page caches are accessible by user space progresses > thus needs the cgroup accounting. > > However btrfs is a special case, it has a very large metadata thanks to > its support of data csum (by default it's 4 bytes per 4K data, and can > be as large as 32 bytes per 4K data). > This means btrfs has to go page cache for its metadata pages, to take > advantage of both cache and reclaim ability of filemap. FYI, in general reclaims for metadata work much better with a shrinker than through the pagecache, because it can be object based and prioritized. > [ENHANCEMENT] > Instead of relying on __GFP_NOFAIL to avoid charge failure, use root > memory cgroup to attach metadata pages. > > Although this needs to export the symbol mem_root_cgroup for > CONFIG_MEMCG, or define mem_root_cgroup as NULL for !CONFIG_MEMCG. > > With root memory cgroup, we directly skip the charging part, and only > rely on __GFP_NOFAIL for the real memory allocation part. This looks pretty ugly. What speaks against a version of filemap_add_folio that doesn't charge the memcg?