Re: [RFC PATCH v1 0/2] Ignore non-LRU-based reclaim in memcg reclaim

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Thu, 2 Feb 2023 16:17:18 -0800

On Thu, Feb 2, 2023 at 4:01 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Thu, Feb 02, 2023 at 11:32:27PM +0000, Yosry Ahmed wrote:
> > Reclaimed pages through other means than LRU-based reclaim are tracked
> > through reclaim_state in struct scan_control, which is stashed in
> > current task_struct. These pages are added to the number of reclaimed
> > pages through LRUs. For memcg reclaim, these pages generally cannot be
> > linked to the memcg under reclaim and can cause an overestimated count
> > of reclaimed pages. This short series tries to address that.
>
> Can you explain why memcg specific reclaim is calling shrinkers that
> are not marked with SHRINKER_MEMCG_AWARE?
>
> i.e. only objects that are directly associated with memcg aware
> shrinkers should be accounted to the memcg, right? If the cache is
> global (e.g the xfs buffer cache) then they aren't marked with
> SHRINKER_MEMCG_AWARE and so should only be called for root memcg
> (i.e. global) reclaim contexts.
>
> So if you are having accounting problems caused by memcg specific
> reclaim on global caches freeing non-memcg accounted memory, isn't
> the problem the way the shrinkers are being called?

Not necessarily, according to my understanding.

My understanding is that we will only free slab objects accounted to
the memcg under reclaim (or one of its descendants), because we call
memcg aware shrinkers, as you pointed out. The point here is slab page
sharing. Ever since we started doing per-object accounting, a slab
page may have objects accounted to different memcgs. IIUC, if we free
a slab object charged to the memcg under reclaim, and this object
happened to be the last object on the page, we will free the slab
page, and count the entire page as reclaimed memory for the purpose of
memcg reclaim, which is where the inaccuracy is coming from.

Please correct me if I am wrong.

>
> > Patch 1 is just refactoring updating reclaim_state into a helper
> > function, and renames reclaimed_slab to just reclaimed, with a comment
> > describing its true purpose.
> >
> > Patch 2 ignores pages reclaimed outside of LRU reclaim in memcg reclaim.
> >
> > The original draft was a little bit different. It also kept track of
> > uncharged objcg pages, and reported them only in memcg reclaim and only
> > if the uncharged memcg is in the subtree of the memcg under reclaim.
> > This was an attempt to make reporting of memcg reclaim even more
> > accurate, but was dropped due to questionable complexity vs benefit
> > tradeoff. It can be revived if there is interest.
> >
> > Yosry Ahmed (2):
> >   mm: vmscan: refactor updating reclaimed pages in reclaim_state
> >   mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim
> >
> >  fs/inode.c           |  3 +--
>
> Inodes and inode mapping pages are directly charged to the memcg
> that allocated them and the shrinker is correctly marked as
> SHRINKER_MEMCG_AWARE. Freeing the pages attached to the inode will
> account them correctly to the related memcg, regardless of which
> memcg is triggering the reclaim.  Hence I'm not sure that skipping
> the accounting of the reclaimed memory is even correct in this case;

Please note that we are not skipping any accounting here. The pages
are still uncharged from the memcgs they are charged to (the allocator
memcgs as you pointed out). We just do not report them in the return
value of try_to_free_mem_cgroup_pages(), to avoid over-reporting.

> I think the code should still be accounting for all pages that
> belong to the memcg being scanned that are reclaimed, not ignoring
> them altogether...

100% agree. Ideally I would want to:
- For pruned inodes: report all freed pages for global reclaim, and
only report pages charged to the memcg under reclaim for memcg
reclaim.
- For slab: report all freed pages for global reclaim, and only report
uncharged objcg pages from the memcg under reclaim for memcg reclaim.

The only problem is that I thought people would think this is too much
complexity and not worth it. If people agree this should be the
approach to follow, I can prepare patches for this. I originally
implemented this for slab pages, but held off on sending it.

>
> -Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx