On Wed, 1 Dec 2021 10:31:47 +0800 Dust Li wrote: > smc_lgr_cleanup_early() meant to delete the link > group from the link group list, but it deleted > the list head by mistake. > > This may cause memory corruption since we didn't > remove the real link group from the list and later > memseted the link group structure. > We got a list corruption panic when testing: > > [ 231.277259] list_del corruption. prev->next should be ffff8881398a8000, but was 0000000000000000 > [ 231.278222] ------------[ cut here ]------------ > [ 231.278726] kernel BUG at lib/list_debug.c:53! > [ 231.279326] invalid opcode: 0000 [#1] SMP NOPTI > [ 231.279803] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.10.46+ #435 > [ 231.280466] Hardware name: Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014 > [ 231.281248] Workqueue: events smc_link_down_work > [ 231.281732] RIP: 0010:__list_del_entry_valid+0x70/0x90 > [ 231.282258] Code: 4c 60 82 e8 7d cc 6a 00 0f 0b 48 89 fe 48 c7 c7 88 4c > 60 82 e8 6c cc 6a 00 0f 0b 48 89 fe 48 c7 c7 c0 4c 60 82 e8 5b cc 6a 00 <0f> > 0b 48 89 fe 48 c7 c7 00 4d 60 82 e8 4a cc 6a 00 0f 0b cc cc cc > [ 231.284146] RSP: 0018:ffffc90000033d58 EFLAGS: 00010292 > [ 231.284685] RAX: 0000000000000054 RBX: ffff8881398a8000 RCX: 0000000000000000 > [ 231.285415] RDX: 0000000000000001 RSI: ffff88813bc18040 RDI: ffff88813bc18040 > [ 231.286141] RBP: ffffffff8305ad40 R08: 0000000000000003 R09: 0000000000000001 > [ 231.286873] R10: ffffffff82803da0 R11: ffffc90000033b90 R12: 0000000000000001 > [ 231.287606] R13: 0000000000000000 R14: ffff8881398a8000 R15: 0000000000000003 > [ 231.288337] FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 > [ 231.289160] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 231.289754] CR2: 0000000000e72058 CR3: 000000010fa96006 CR4: 00000000003706f0 > [ 231.290485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 231.291211] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 231.291940] Call Trace: > [ 231.292211] smc_lgr_terminate_sched+0x53/0xa0 > [ 231.292677] smc_switch_conns+0x75/0x6b0 > [ 231.293085] ? update_load_avg+0x1a6/0x590 > [ 231.293517] ? ttwu_do_wakeup+0x17/0x150 > [ 231.293907] ? update_load_avg+0x1a6/0x590 > [ 231.294317] ? newidle_balance+0xca/0x3d0 > [ 231.294716] smcr_link_down+0x50/0x1a0 > [ 231.295090] ? __wake_up_common_lock+0x77/0x90 > [ 231.295534] smc_link_down_work+0x46/0x60 > [ 231.295933] process_one_work+0x18b/0x350 > > Fixes: a0a62ee15a829 ("net/smc: separate locks for SMCD and SMCR link group lists") > Signed-off-by: Dust Li <dust.li@xxxxxxxxxxxxxxxxx> > Acked-by: Karsten Graul <kgraul@xxxxxxxxxxxxx> > net/smc/smc_core.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c > index bb52c8b5f148..8759f9fd8113 100644 > --- a/net/smc/smc_core.c > +++ b/net/smc/smc_core.c > @@ -625,18 +625,16 @@ int smcd_nl_get_lgr(struct sk_buff *skb, struct netlink_callback *cb) > void smc_lgr_cleanup_early(struct smc_connection *conn) > { > struct smc_link_group *lgr = conn->lgr; > - struct list_head *lgr_list; > spinlock_t *lgr_lock; > > if (!lgr) > return; > > smc_conn_free(conn); > - lgr_list = smc_lgr_list_head(lgr, &lgr_lock); > spin_lock_bh(lgr_lock); > /* do not use this link group for new connections */ > - if (!list_empty(lgr_list)) > - list_del_init(lgr_list); > + if (!list_empty(&lgr->list)) > + list_del_init(&lgr->list); > spin_unlock_bh(lgr_lock); > __smc_lgr_terminate(lgr, true); > } clang has something to say about that: net/smc/smc_core.c:634:15: warning: variable 'lgr_lock' is uninitialized when used here [-Wuninitialized] spin_lock_bh(lgr_lock); ^~~~~~~~ net/smc/smc_core.c:628:22: note: initialize the variable 'lgr_lock' to silence this warning spinlock_t *lgr_lock; ^ = NULL