> On Jul 27, 2023, at 15:02, Muchun Song <muchun.song@xxxxxxxxx> wrote: > > > >> On Jul 27, 2023, at 00:52, Alexander Potapenko <glider@xxxxxxxxxx> wrote: >> >> On Tue, Jul 25, 2023 at 6:21 PM Alexander Potapenko <glider@xxxxxxxxxx> wrote: >>> >>> On Tue, Jul 25, 2023 at 3:39 PM Naresh Kamboju >>> <naresh.kamboju@xxxxxxxxxx> wrote: >>>> >>>> On Tue, 25 Jul 2023 at 17:22, Alexander Potapenko <glider@xxxxxxxxxx> wrote: >>>>> >>>>> On Tue, Jul 25, 2023 at 11:59 AM Alexander Potapenko <glider@xxxxxxxxxx> wrote: >>>>>> >>>>>> On Mon, Jul 24, 2023 at 2:10 PM Naresh Kamboju >>>>>> <naresh.kamboju@xxxxxxxxxx> wrote: >>>>>>> >>>>>>> On Mon, 24 Jul 2023 at 15:50, Alexander Potapenko <glider@xxxxxxxxxx> wrote: >>>>>>>> >>>>>>>> On Sat, Jul 22, 2023 at 6:37 PM Linus Torvalds >>>>>>>> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: >>>>>>>>> >>>>>>>>> [ Removed the stable reviewers, bringing in the kfence people ] >>>>>>>>> >>>>>>>>> See >>>>>>>>> >>>>>>>>> https://lore.kernel.org/lkml/CA+G9fYvgy22wiY=c3wLOrCM6o33636abhtEynXhJkqxJh4ca0A@xxxxxxxxxxxxxx/ >>>>>>>>> >>>>>>>>> for the original report. The warning was introduced in 8f0b36497303 >>>>>>>>> ("mm: kfence: fix objcgs vector allocation"), and Google doesn't find >>>>>>>>> any other cases of this. >>>>>>>>> >>>>>>>>> Anybody? >>>>>>>>> >>>>>>>>> Linus >>>>>>>>> >>>>>>>> >>> >>> Muchun, any chance you know under what circumstances a KFENCE object >>> has its meta->objcg set to a non-NULL value? >>> It seems to be a quite rare case, and I've only seen it in live >>> radix_tree_node objects. >>> Since the check here: >>> https://elixir.bootlin.com/linux/latest/source/mm/kfence/core.c#L1097 >>> ensures that this value is NULL when the object is freed, where is the >>> code that is supposed to zero it? >>> Could there be a race somewhere? >> >> >> I am still puzzled about what is going on. >> >> As far as I can see, when KFENCE pool is initialized, for ith object >> page in the pool its page_slab()->memcg_data is set to a value derived >> from kfence_metadata[i].objcg >> Because KFENCE objects always occupy one page, no two objects are >> expected to share memcg_data at any time. >> >> When slab_alloc_node() is called, it first invokes >> slab_pre_alloc_hook(), figures out the obj_cgroup and charges it for >> the allocated memory. The obj_cgroup is returned to slab_alloc_node() >> and after KFENCE allocation succeeds is passed to >> slab_post_alloc_hook(), which then writes obj_cgroup to >> *(page_slab(object)->memcg_data). >> >> When an object is deallocated, slab_free() calls >> memcg_slab_free_hook(), which zeroes *(page_slab(object)->memcg_data) >> and passes the object to kfence_free(). >> At this point the object's meta->objcg must be NULL, so the warning >> should not be firing. > > At least, totally agree. This call stack comes from slab_free() which > makes sure memcg_slab_free_hook() is called before kfence_free(), so > meta->objcg must be NULL. Otherwise, seems something is corrupted. So > I really want to know what's the value of "meta->objcg" when the warning > is firing (e.g. whether it is a valid pointer or does the last bit is > set with MEMCG_DATA_OBJCGS). Maybe we could improve the warning message, Sorry for the confusing, meta->objcg should be a objcg pointer, it cannot be set with MEMCG_DATA_OBJCGS. > e.g. print the current value of "meta->objcg". > > Thanks.