On 2024/4/16 20:06, Paolo Abeni wrote: > On Sat, 2024-04-13 at 11:51 +0800, Zhengchao Shao wrote: >> Potential sleeping issue exists in the following processes: >> smc_switch_conns >> spin_lock_bh(&conn->send_lock) >> smc_switch_link_and_count >> smcr_link_put >> __smcr_link_clear >> smc_lgr_put >> __smc_lgr_free >> smc_lgr_free_bufs >> __smc_lgr_free_bufs >> smc_buf_free >> smcr_buf_free >> smcr_buf_unmap_link >> smc_ib_put_memory_region >> ib_dereg_mr >> ib_dereg_mr_user >> mr->device->ops.dereg_mr >> If scheduling exists when the IB driver implements .dereg_mr hook >> function, the bug "scheduling while atomic" will occur. For example, >> cxgb4 and efa driver. Use mutex lock instead of spin lock to fix it. > > I tried to inspect all the lock call sites, and it *look* like they are > all in process context, so the switch should be feasible. There exist some calls from tasklet, where mutex lock is infeasible. For example: - tasklet -> smc_wr_tx_tasklet_fn -> smc_wr_tx_process_cqe -> pnd_snd.handler -> smc_cdc_tx_handler -> smc_tx_pending -> smc_tx_sndbuf_nonempty -> smcr_tx_sndbuf_nonempty -> spin_lock_bh(&conn->send_lock) - tasklet -> smc_wr_rx_tasklet_fn -> smc_wr_rx_process_cqes -> smc_wr_rx_demultiplex -> smc_cdc_rx_handler -> smc_cdc_msg_validate -> spin_lock_bh(&conn->send_lock) Thanks, Guangguan Wang > > Still the fact that the existing lock is a BH variant is suspect. > Either the BH part was not needed or this can introduce subtle > regressions/issues. > > I think this deserves at least a 3rd party testing. > > Thanks, > > Paolo >