We encountered some crashes recently and they are caused by the race between the access and free of link/link group in abnormal smc link group termination. The crashes can be reproduced in frequent abnormal link group termination, like setting RNICs up/down. This set of patches tries to fix this by extending the life cycle of link/link group to ensure that they won't be referred to after cleared or freed. v1 -> v2: - Improve some comments. - Move codes of waking up lgrs_deleted wait queue from smc_lgr_free() to __smc_lgr_free(). - Move codes of waking up links_deleted wait queue from smcr_link_clear() to __smcr_link_clear(). - Move codes of smc_ibdev_cnt_dec() and put_device() from smcr_link_clear() to __smcr_link_clear() - Move smc_lgr_put() to the end of __smcr_link_clear(). - Call smc_lgr_put() after 'out' tag in smcr_link_init() when link initialization fails. - Modify the location where smc connection holds the lgr or link. before: * hold lgr in smc_lgr_register_conn(). * hold link in smcr_lgr_conn_assign_link(). after: * hold both lgr and link in smc_conn_create(). Modify the location to symmetrical with the place where smc connections put the lgr or link, which is smc_conn_free(). - Initialize conn->freed as zero in smc_conn_create(). Wen Gu (3): net/smc: Resolve the race between link group access and termination net/smc: Introduce a new conn->lgr validity check helper net/smc: Resolve the race between SMC-R link access and clear net/smc/af_smc.c | 6 ++- net/smc/smc.h | 1 + net/smc/smc_cdc.c | 3 +- net/smc/smc_clc.c | 2 +- net/smc/smc_core.c | 120 +++++++++++++++++++++++++++++++++++++++++------------ net/smc/smc_core.h | 12 ++++++ net/smc/smc_diag.c | 6 +-- 7 files changed, 118 insertions(+), 32 deletions(-) -- 1.8.3.1