Re: [LSF/MM/BPF Topic] Performance improvement for Memory Cgroups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 18, 2025 at 11:19:42PM -0700, Shakeel Butt wrote:
> A bit late but let me still propose a session on topics related to memory
> cgroups. Last year at LSFMM 2024, we discussed [1] about the potential
> deprecation of memcg v1. Since then we have made very good progress in that
> regard. We have moved the v1-only code in a separate file and make it not
> compile by default, have added warnings in many v1-only interfaces and have
> removed a lot of v1-only code. This year, I want to focus on performance of
> memory cgroup, particularly improving cost of charging and stats.
> 
> At the high level we can partition the memory charging in three cases. First
> is the user memory (anon & file), second if kernel memory (slub mostly) and
> third is network memory. For network memory, [1] has described some of the
> challenges. Similarly for kernel memory, we had to revert patches where memcg
> charging was too expensive [3,4].
> 
> I want to discuss and brainstorm different ways to further optimize the
> memcg charging for all these types of memory. I am at the moment prototying
> multi-memcg support for per-cpu memcg stocks and would like to see what else
> we can do.

For slab memory, I have an idea:

Deferring the uncharging of slab objects on free until the CPU slab and
per-CPU partial slabs are moved to the per-node partial slab list
might be beneficial.

Something like:

    0. SLUB allocator defers uncharging objects if the slab the freed
       objects belong to is the CPU slab or in the percpu partial slab
       list.

    1. memcg_slab_post_alloc_hook() does:
       1.1 Skips charging, if the object is already charged to the same
           memcg and has not been uncharged yet.
       1.2 Uncharges the object if it is charged to a different memcg
           and then charges it to current memcg.
       1.3 Charges the object if it's not currently not charged to any memcg.

    2. deactivate_slab() and __put_partials() uncharges free objects
       that were not uncharged yet before moving them to the per-node
       partial slab list.

Unless 1) we have tasks belonging to many different memcgs on each CPU
(I'm not an expert on the scheduler's interaction with cgroups, though),
or 2) load balancing migrates tasks between CPUs too frequently,

many allocations should hit case 1.1 (Oh, it's already charged to the same
memcg so skip charging) in the hot path, right?

Some experiments are needed to determine whether this idea is actually
beneficial.

Or has a similar approach been tried before?

-- 
Cheers,
Harry

> One additional interesting observation from our fleet is that the cost of 
> memory charging increases for the users of memory.low and memory.min. Basically
> propagate_protected_usage() becomes very prominently visible in the perf
> traces.
> 
> Other than charging, the memcg stats infra also is very expensive and a lot
> of CPUs in our fleet are spent on maintaining these stats. Memcg stats use
> rstat infrastructure which is designed for fast updates and slow readers.
> The updaters put the cgroup in a per-cpu update tree while the stats readers
> flushes update trees of all the cpus. For memcg, the flushes has become very
> expensive and over the years we have added ratelimiting to limit the cost.
> I want to discuss what else we can do to further improve the memcg stats.
> 
> Other than the performance of charging and memcg stats, time permitting, we
> can discuss other memcg topics like new features or something still lacking.
> 
> [1] https://lwn.net/Articles/974575/
> [2] https://lore.kernel.org/all/20250307055936.3988572-1-shakeel.butt@xxxxxxxxx/
> [3] 3754707bcc3e ("Revert "memcg: enable accounting for file lock caches"")
> [4] 0bcfe68b8767 ("Revert "memcg: enable accounting for pollfd and select bits arrays"")
> 




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux