On Tue, Nov 15, 2022 at 11:24 AM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote: > > > > On 11/15/22 12:18 AM, Haris Iqbal wrote: > > On Sun, Nov 13, 2022 at 2:08 AM Guoqing Jiang<guoqing.jiang@xxxxxxxxx> wrote: > >> RTRS creates several connections per nr_cpu_ids, it makes more sense > >> to only close the path when it just allocated. > >> > >> Signed-off-by: Guoqing Jiang<guoqing.jiang@xxxxxxxxx> > >> --- > >> drivers/infiniband/ulp/rtrs/rtrs-srv.c | 5 ++++- > >> 1 file changed, 4 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c > >> index 2cc8b423bcaa..063082d29fc6 100644 > >> --- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c > >> +++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c > >> @@ -1833,6 +1833,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id, > >> u16 version, con_num, cid; > >> u16 recon_cnt; > >> int err = -ECONNRESET; > >> + bool alloc_path = false; > >> > >> if (len < sizeof(*msg)) { > >> pr_err("Invalid RTRS connection request\n"); > >> @@ -1906,6 +1907,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id, > >> pr_err("RTRS server session allocation failed: %d\n", err); > >> goto reject_w_err; > >> } > >> + alloc_path = true; > >> } > >> err = create_con(srv_path, cm_id, cid); > >> if (err) { > >> @@ -1940,7 +1942,8 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id, > >> > >> close_and_return_err: > >> mutex_unlock(&srv->paths_mutex); > >> - close_path(srv_path); > >> + if (alloc_path) > >> + close_path(srv_path); > > I think the way this is coded is, if there is a problem opening a > > connection in that srv_path, then it closes that srv_path gracefully, > > which results in all other connections in that path getting closed, > > and the IOs being failover'ed to the other path (in case of > > multipath), and then the client would trigger an auto reconnect. > > Could it possible that IO is happening in the previous connection, then > a later > failed connection try to close all the connections. Yes, the idea is if any failure during rdma_connect, we close the full path and expecting the other healthy path to take over. > > > @Jinpu can shed some more light, what would be the preferred course of > > action if there is a failure in one connection. To keep the current > > design or to avoid closing the path in case of a connection issue. > > Frankly, I am less confident about this patch and seems it is a rare > condition. > Will just drop it for now. > > Thanks, > Guoqing