Re: Questions about memcg_slabinfo.py drgn script

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Wed, 16 Mar 2022 14:16:07 -0700



On Thu, Mar 10, 2022 at 3:41 PM Roman Gushchin <roman.gushchin@xxxxxxxxx> wrote:
>
> On Thu, Mar 10, 2022 at 12:21:44PM -0800, Yosry Ahmed wrote:
> > Hi everyone,
> >
> > I was looking at the memcg_slabinfo.py drgn script that offers a
> > replacement to the deprecated memory.kmem.slabinfo. I had some
> > questions about how it collects the memcg slab stats:
>
> Hi Yosry!
>
> First, I have to admit that I haven't spent too much time optimizing
> this script for speed. So, there are almost certainly available opportunities
> to enhance it and patches are highly welcome.
>
> >
> > 1. Why does the script loop through all struct pages on the system?
> > Wouldn't it be more efficient to loop for every kmem_cache, for every
> > online kmem_cache_node, then loop through slabs_free, slabs_full, and
> > slabs_partial lists?
>
> It's somewhat tricky with SLUB because of per-cpu partial pages (I'm less
> familiar with SLAB). In theory, we have a single-linked list of such pages,
> but idk if we can reliably traverse it (given that it will be changed
> concurrently). We also will be way more dependent on SLUB internals.
> However it still might be a good optimization.
>
> >
> > This seems more consistent with how /proc/slabinfo works, and more
> > efficient. I tested this on SLAB using a crash script as I am unable
> > to run drgn on my current setup. I am not sure how correct this would
> > be for SLUB though.
>
> /proc/slabinfo has its own weaknesses, e.g. it shows systematically wrong
> numbers for slab utilization because of how it handles per-cpu partial pages
> (on SLUB).
>
Honestly I haven't looked into this for SLUB, but it seems like it is
a valid optimization for SLAB. At least it is equivalent to
/proc/slabinfo which I assume is somehow accurate for SLAB.
> >
> > 2. Before looping through pages, why does the script collect all
> > objcgs belonging to the desired memcg in a set, and then test every
> > objcg in a slab page to see whether it belongs to that memcg. Wouldn't
> > it be easier to just check objcg->memcg? AFAICT this gets updated as
> > well when the objcg is reparented.
>
> I can't think of any good reason now, however it's not obviously faster
> (I guess dereferencing of a pointer in drgn can be more expensive than doing
> few "local" comparison, something to measure).
> If it is faster, it will be a good enhancement.
>

You are right (at least for crash) it is more expensive to dereference
the obj_cgroup pointers!
> >
> > Sorry for my ignorance if any of the assumptions I made are incorrect.
> > I just wanted to get more understanding of the implementation
> > decisions taken while writing the script.
>
> You're welcome!