On Fri, 22 Apr 2022 at 07:09, Muchun Song <songmuchun@xxxxxxxxxxxxx> wrote: > > On Thu, Apr 21, 2022 at 11:12:17AM +0200, Marco Elver wrote: > > On Thu, Apr 21, 2022 at 01:58AM -0700, syzbot wrote: > > > Hello, > > > > > > syzbot found the following issue on: > > > > > > HEAD commit: 559089e0a93d vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLO.. > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=10853220f00000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=2e1f9b9947966f42 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=ffe71f1ff7f8061bcc98 > > > compiler: aarch64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 > > > userspace arch: arm64 > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+ffe71f1ff7f8061bcc98@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > > > ------------[ cut here ]------------ > > > WARNING: CPU: 0 PID: 2216 at mm/kfence/core.c:1022 __kfence_free+0x84/0xc0 mm/kfence/core.c:1022 > > > > That's this warning in __kfence_free: > > > > #ifdef CONFIG_MEMCG > > KFENCE_WARN_ON(meta->objcg); > > #endif > > > > introduced in 8f0b36497303 ("mm: kfence: fix objcgs vector allocation"). > > > > Muchun, are there any circumstances where the assumption may be broken? > > Or a new bug elsewhere? > > meta->objcg always should be NULL when reaching __kfence_free(). > In theory, meta->objcg should be cleared via memcg_slab_free_hook(). > > I found the following code snippet in do_slab_free(). > > /* memcg_slab_free_hook() is already called for bulk free. */ > if (!tail) > memcg_slab_free_hook(s, &head, 1); > > The only posibility is @tail is not NULL, which is the case of > kmem_cache_free_bulk(). However, here the call trace is kfree(), > it seems to be impossible that missing call memcg_slab_free_hook(). Fair enough - we can probably wait for the bug to reoccur on another instance, and until then assume something else wrong. What is slightly suspicious is that it only occurred once on a QEMU TCG arm64 MTE instance. Thanks, -- Marco