在 2024/10/1 18:49, Christoph Hellwig 写道:
On Sat, Sep 28, 2024 at 02:15:56PM +0930, Qu Wenruo wrote:
[BACKGROUND]
The function filemap_add_folio() charges the memory cgroup,
as we assume all page caches are accessible by user space progresses
thus needs the cgroup accounting.
However btrfs is a special case, it has a very large metadata thanks to
its support of data csum (by default it's 4 bytes per 4K data, and can
be as large as 32 bytes per 4K data).
This means btrfs has to go page cache for its metadata pages, to take
advantage of both cache and reclaim ability of filemap.
FYI, in general reclaims for metadata work much better with a shrinker
than through the pagecache, because it can be object based and
prioritized.
[ENHANCEMENT]
Instead of relying on __GFP_NOFAIL to avoid charge failure, use root
memory cgroup to attach metadata pages.
Although this needs to export the symbol mem_root_cgroup for
CONFIG_MEMCG, or define mem_root_cgroup as NULL for !CONFIG_MEMCG.
With root memory cgroup, we directly skip the charging part, and only
rely on __GFP_NOFAIL for the real memory allocation part.
This looks pretty ugly. What speaks against a version of
filemap_add_folio that doesn't charge the memcg?
Because there is so far only one caller has such requirement.
Furthermore I believe the folio API doesn't prefer too many different
functions doing similar things.
E.g. the new folio interfaces only provides filemap_get_folio(),
filemap_lock_folio(), and the more generic __filemap_get_folio().
Meanwhile there are tons of page based interfaces, find_get_page(),
find_or_create_page(), find_lock_page() and flags version etc.
Thus I think something like filemap_add_folio_no_memcg_charge() will be
rejected.
Finally, it's not feasible to go with a new GFP flag either.
We already have __GFP_ACCOUNT for memcg charging purposes, but for
filemap_add_folio() even if we do not pass __GFP_ACCOUNT, the memcg will
still be charged.
It will be even more ugly if we add a __GFP_NO_ACCOUNT, and such attempt
is already rejected before IIRC.
Thanks,
Qu