On 10/26/20 15:25, Jason Gunthorpe wrote:
There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the handler triggers a completion and another thread does rdma_connect() or the handler directly calls rdma_connect(). In all cases rdma_connect() needs to hold the handler_mutex, but when handler's are invoked this is already held by the core code. This causes ULPs using the 2nd method to deadlock. Provide a rdma_connect_locked() and have all ULPs call it from their handlers. Reported-by: Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> Fixes: 2a7cec538169 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state" Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> --- drivers/infiniband/core/cma.c | 39 +++++++++++++++++++++--- drivers/infiniband/ulp/iser/iser_verbs.c | 2 +- drivers/infiniband/ulp/rtrs/rtrs-clt.c | 4 +-- drivers/nvme/host/rdma.c | 10 +++--- include/rdma/rdma_cm.h | 13 +------- net/rds/ib_cm.c | 5 +-- 6 files changed, 47 insertions(+), 26 deletions(-) Seems people are not testing these four ULPs against rdma-next.. Here is a quick fix for the issue: https://lore.kernel.org/r/3b1f7767-98e2-93e0-b718-16d1c5346140@xxxxxxxxxxxxxxx
I can't see the previous calltrace with this patch. Tested-by: Guoqing Jiang<guoqing.jiang@xxxxxxxxxxxxxxx> Thanks, Guoqing