On Sun, Feb 05, 2023 at 06:58:00AM +0000, Yafang Shao wrote: > The bpf memory accouting has some known problems in contianer > environment, > > - The container memory usage is not consistent if there's pinned bpf > program > After the container restart, the leftover bpf programs won't account > to the new generation, so the memory usage of the container is not > consistent. This issue can be resolved by introducing selectable > memcg, but we don't have an agreement on the solution yet. See also > the discussions at https://lwn.net/Articles/905150/ . > > - The leftover non-preallocated bpf map can't be limited > The leftover bpf map will be reparented, and thus it will be limited by > the parent, rather than the container itself. Furthermore, if the > parent is destroyed, it be will limited by its parent's parent, and so > on. It can also be resolved by introducing selectable memcg. > > - The memory dynamically allocated in bpf prog is charged into root memcg > only > Nowdays the bpf prog can dynamically allocate memory, for example via > bpf_obj_new(), but it only allocate from the global bpf_mem_alloc > pool, so it will charge into root memcg only. That needs to be > addressed by a new proposal. > > So let's give the user an option to disable bpf memory accouting. > > The idea of "cgroup.memory=nobpf" is originally by Tejun[1]. I'm not the most familiar with bpf internals, but the memcg bits and adding the boot time flag look good to me: Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>