On Thu, Jul 16, 2020 at 2:28 PM David Rientjes <rientjes@xxxxxxxxxx> wrote: > > On Thu, 16 Jul 2020, Shakeel Butt wrote: > > > > Userspace can lack insight into the amount of memory that can be reclaimed > > > from a memcg based on values from memory.stat. Two specific examples: > > > > > > - Lazy freeable memory (MADV_FREE) that are clean anonymous pages on the > > > inactive file LRU that can be quickly reclaimed under memory pressure > > > but otherwise shows up as mapped anon in memory.stat, and > > > > > > - Memory on deferred split queues (thp) that are compound pages that can > > > be split and uncharged from the memcg under memory pressure, but > > > otherwise shows up as charged anon LRU memory in memory.stat. > > > > > > Both of this anonymous usage is also charged to memory.current. > > > > > > Userspace can currently derive this information but it depends on kernel > > > implementation details for how this memory is handled for the purposes of > > > reclaim (anon on inactive file LRU or unmapped anon on the LRU). > > > > > > For the purposes of writing portable userspace code that does not need to > > > have insight into the kernel implementation for reclaimable memory, this > > > exports a stat that reveals the amount of anonymous memory that can be > > > reclaimed and uncharged from the memcg to start new applications. > > > > > > As the kernel implementation evolves for memory that can be reclaimed > > > under memory pressure, this stat can be kept consistent. > > > > > > Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx> > > > --- > > > Documentation/admin-guide/cgroup-v2.rst | 6 +++++ > > > mm/memcontrol.c | 31 +++++++++++++++++++++++++ > > > 2 files changed, 37 insertions(+) > > > > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > > > --- a/Documentation/admin-guide/cgroup-v2.rst > > > +++ b/Documentation/admin-guide/cgroup-v2.rst > > > @@ -1296,6 +1296,12 @@ PAGE_SIZE multiple when read back. > > > Amount of memory used in anonymous mappings backed by > > > transparent hugepages > > > > > > + anon_reclaimable > > > + The amount of charged anonymous memory that can be reclaimed > > > + under memory pressure without swap. This currently includes > > > + lazy freeable memory (MADV_FREE) and compound pages that can be > > > + split and uncharged. > > > + > > > inactive_anon, active_anon, inactive_file, active_file, unevictable > > > Amount of memory, swap-backed and filesystem-backed, > > > on the internal memory management lists used by the > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -1350,6 +1350,32 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) > > > return false; > > > } > > > > > > +/* > > > + * Returns the amount of anon memory that is charged to the memcg that is > > > + * reclaimable under memory pressure without swap, in pages. > > > + */ > > > +static unsigned long memcg_anon_reclaimable(struct mem_cgroup *memcg) > > > +{ > > > + long deferred, lazyfree; > > > + > > > + /* > > > + * Deferred pages are charged anonymous pages that are on the LRU but > > > + * are unmapped. These compound pages are split under memory pressure. > > > + */ > > > + deferred = max_t(long, memcg_page_state(memcg, NR_ACTIVE_ANON) + > > > + memcg_page_state(memcg, NR_INACTIVE_ANON) - > > > + memcg_page_state(memcg, NR_ANON_MAPPED), 0); > > > > Please note that the NR_ANON_MAPPED does not include tmpfs memory but > > NR_[IN]ACTIVE_ANON does include the tmpfs. > > > > > + /* > > > + * Lazyfree pages are charged clean anonymous pages that are on the file > > > + * LRU and can be reclaimed under memory pressure. > > > + */ > > > + lazyfree = max_t(long, memcg_page_state(memcg, NR_ACTIVE_FILE) + > > > + memcg_page_state(memcg, NR_INACTIVE_FILE) - > > > + memcg_page_state(memcg, NR_FILE_PAGES), 0); > > > > Similarly NR_FILE_PAGES includes tmpfs memory but NR_[IN]ACTIVE_FILE does not. > > > > Ah, so this adds to the motivation of providing the anon_reclaimable stat > because the calculation becomes even more convoluted and completely based > on the kernel implementation details for both lazyfree memory and deferred > split queues. Yes, I agree. > Did you have a calculation in mind for > memcg_anon_reclaimable()? For deferred, "memcg->deferred_split_queue.split_queue_len" should be usable. For lazyfree, NR_ACTIVE_FILE + NR_INACTIVE_FILE + NR_SHMEM - NR_FILE_PAGES seems like the right formula.