On Sun, Feb 12, 2023 at 10:12:12PM +0800, Yafang Shao wrote: > On Sat, Feb 11, 2023 at 6:39 AM Dennis Zhou <dennis@xxxxxxxxxx> wrote: > > > > Hello, > > > > On Fri, Feb 10, 2023 at 02:05:08PM -0800, Andrew Morton wrote: > > > On Fri, 10 Feb 2023 15:49:47 +0000 Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > > > > The extra space which is used to store the obj_cgroup membership is only > > > > valid when kmemcg is enabled. The kmemcg can be disabled via the kernel > > > > parameter "cgroup.memory=nokmem" at runtime. > > > > This helper is also used in non-memcg code, for example the tracepoint, > > > > so we should fix it. > > > > > > > > It was found by code review when I was implementing bpf memory usage[1]. > > > > No real issue happens in production environment. > > > > > > > > ... > > > > > > > > --- a/mm/percpu-internal.h > > > > +++ b/mm/percpu-internal.h > > > > @@ -4,6 +4,7 @@ > > > > > > > > #include <linux/types.h> > > > > #include <linux/percpu.h> > > > > +#include <linux/memcontrol.h> > > > > > > > > /* > > > > * pcpu_block_md is the metadata block struct. > > > > @@ -125,7 +126,8 @@ static inline size_t pcpu_obj_full_size(size_t size) > > > > size_t extra_size = 0; > > > > > > > > #ifdef CONFIG_MEMCG_KMEM > > > > - extra_size += size / PCPU_MIN_ALLOC_SIZE * sizeof(struct obj_cgroup *); > > > > + if (!mem_cgroup_kmem_disabled()) > > > > + extra_size += size / PCPU_MIN_ALLOC_SIZE * sizeof(struct obj_cgroup *); > > > > #endif > > > > > > > > return size * num_possible_cpus() + extra_size; > > > > > > > Sorry I've been a bit mia... > > > > > Seems risky at the first look - enabling kmemcg at runtime will make > > > prior calculations based on pcpu_obj_full_size) incorrect. But as long > > > as this is only used for accounting I guess that's OK. > > > > > > What happens if we do a bunch of allocations with kmemcg enabled, then > > > disable kmemcg then free those allocations, or some such thing. Does > > > the accounting end up wrong? > > > > > > > For now it works correctly because of 2 things. 1 - the function is only > > called by accounting. 2 - the free path doesn't consult > > mem_cgroup_kmem_disabled() but consults if a memcg exists for a percpu > > allocation. If accounting is enabled, we'd always account the additional > > memory for the memcg accounting. If it's not enabled, then percpu is > > well unaccounted for. > > > > This function probably needs to be renamed a bit more carefully so it > > doesn't bleed outside of mm/percpu.c. > > > > Do you have any suggestions on the new name ? > > > In short, I don't think this change is correct. > > Could you pls be more specific ? > Hmmm I got ahead of myself. I misunderstood memcg_*_enabled() vs memcg_*_disabled(). Roman clarified it just now in [1]. I was imagining a world where we add disabled here and then eventually enabled would propagate here too. Anothing that was on my mind is, should a percpu object be charged for the memcg space even if it's not in use. I now think it's yes and then for general accounting outside of memcg, this function is correct. Acked-by: Dennis Zhou <dennis@xxxxxxxxxx> Andrew, I have nothing queued. Do you mind picking this up? [1] https://lore.kernel.org/linux-mm/20230213192922.1146370-1-roman.gushchin@xxxxxxxxx/T/#u Thanks, Dennis