On Mon, 2021-10-18 at 19:12 +0300, Vlad Buslov wrote: > On Mon 18 Oct 2021 at 18:42, Jakub Kicinski <kuba@xxxxxxxxxx> wrote: > > On Mon, 18 Oct 2021 17:04:19 +0300 Vlad Buslov wrote: > > > We got a use-after-free with very similar trace [0] during > > > nightly > > > regression. The issue happens when ip link up/down state is > > > flipped > > > several times in loop and doesn't reproduce for me manually. The > > > fact > > > that it didn't reproduce for me after running test ten times > > > suggests > > > that it is either very hard to reproduce or that it is a result > > > of some > > > interaction between several tests in our suite. > > > > > > [0]: > > > > > > [ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up > > > [ 3187.890694] > > > ================================================================= > > > = > > > [ 3187.892518] BUG: KASAN: use-after-free in > > > __list_add_valid+0xc3/0xf0 > > > [ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task > > > ip/119618 > > > > Hm, not sure how similar it is. This one looks like channel was > > freed > > without deleting NAPI. Do you have list debug enabled? > > Yes, CONFIG_DEBUG_LIST is enabled. > do you have core dumps ? let's enable kernel.panic_on_oops with core dumps and look at it next time we see this, I really don't think mlx5 is leaking..