On 13.04.24 05:51, Zhengchao Shao wrote:
Potential sleeping issue exists in the following processes:
smc_switch_conns
spin_lock_bh(&conn->send_lock)
smc_switch_link_and_count
smcr_link_put
__smcr_link_clear
smc_lgr_put
__smc_lgr_free
smc_lgr_free_bufs
__smc_lgr_free_bufs
smc_buf_free
smcr_buf_free
smcr_buf_unmap_link
smc_ib_put_memory_region
ib_dereg_mr
ib_dereg_mr_user
mr->device->ops.dereg_mr
If scheduling exists when the IB driver implements .dereg_mr hook
function, the bug "scheduling while atomic" will occur. For example,
cxgb4 and efa driver. Use mutex lock instead of spin lock to fix it.
Fixes: 20c9398d3309 ("net/smc: Resolve the race between SMC-R link access and clear")
Signed-off-by: Zhengchao Shao <shaozhengchao@xxxxxxxxxx>
---
net/smc/af_smc.c | 2 +-
net/smc/smc.h | 2 +-
net/smc/smc_cdc.c | 14 +++++++-------
net/smc/smc_core.c | 8 ++++----
net/smc/smc_tx.c | 8 ++++----
5 files changed, 17 insertions(+), 17 deletions(-)
Hi Zhengchao,
If I understand correctly, the sleeping issue is not the core issue, it
looks like a kind of deadlock or kernel pointer dereference issue. Did
you get crash? Do you have any backtrace? Why do you think the mutex
lock will fix it?
Thanks,
Wenjia