Re: [PATCH RFC 03/12] RDMA/rtrs-srv: Only close srv_path if it is just allocated

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 15, 2022 at 11:24 AM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote:
>
>
>
> On 11/15/22 12:18 AM, Haris Iqbal wrote:
> > On Sun, Nov 13, 2022 at 2:08 AM Guoqing Jiang<guoqing.jiang@xxxxxxxxx>  wrote:
> >> RTRS creates several connections per nr_cpu_ids, it makes more sense
> >> to only close the path when it just allocated.
> >>
> >> Signed-off-by: Guoqing Jiang<guoqing.jiang@xxxxxxxxx>
> >> ---
> >>   drivers/infiniband/ulp/rtrs/rtrs-srv.c | 5 ++++-
> >>   1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-srv.c b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
> >> index 2cc8b423bcaa..063082d29fc6 100644
> >> --- a/drivers/infiniband/ulp/rtrs/rtrs-srv.c
> >> +++ b/drivers/infiniband/ulp/rtrs/rtrs-srv.c
> >> @@ -1833,6 +1833,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
> >>          u16 version, con_num, cid;
> >>          u16 recon_cnt;
> >>          int err = -ECONNRESET;
> >> +       bool alloc_path = false;
> >>
> >>          if (len < sizeof(*msg)) {
> >>                  pr_err("Invalid RTRS connection request\n");
> >> @@ -1906,6 +1907,7 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
> >>                          pr_err("RTRS server session allocation failed: %d\n", err);
> >>                          goto reject_w_err;
> >>                  }
> >> +               alloc_path = true;
> >>          }
> >>          err = create_con(srv_path, cm_id, cid);
> >>          if (err) {
> >> @@ -1940,7 +1942,8 @@ static int rtrs_rdma_connect(struct rdma_cm_id *cm_id,
> >>
> >>   close_and_return_err:
> >>          mutex_unlock(&srv->paths_mutex);
> >> -       close_path(srv_path);
> >> +       if (alloc_path)
> >> +               close_path(srv_path);
> > I think the way this is coded is, if there is a problem opening a
> > connection in that srv_path, then it closes that srv_path gracefully,
> > which results in all other connections in that path getting closed,
> > and the IOs being failover'ed to the other path (in case of
> > multipath), and then the client would trigger an auto reconnect.
>
> Could it possible that IO is happening in the previous connection, then
> a later
> failed connection try to close all the connections.

Yes, the idea is if any failure during rdma_connect, we close the full
path and expecting the other healthy path
to take over.
>
> > @Jinpu can shed some more light, what would be the preferred course of
> > action if there is a failure in one connection. To keep the current
> > design or to avoid closing the path in case of a connection issue.
>
> Frankly, I am less confident about this patch and seems it is a rare
> condition.
> Will just drop it for now.
>
> Thanks,
> Guoqing



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux