RE: [PATCH v2 00/14] rdma/siw: implement non-blocking connect

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Stefan Metzmacher <metze@xxxxxxxxx>
> Sent: Wednesday, 15 June 2022 17:27
> To: Bernard Metzler <BMT@xxxxxxxxxxxxxx>
> Cc: Stefan Metzmacher <metze@xxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx
> Subject: [EXTERNAL] [PATCH v2 00/14] rdma/siw: implement non-blocking
> connect
> 
> Hi Bernard,
> 
> as written a few month ago, I have a patchset with a lot
> of fixes for siw.ko.
> 

Hi Stefan, much appreciated! I think the erdma driver
did a good job implementing something similar, but w/o
the need to look into MPA v2 specifics, especially the
extra handshake in RDMA mode.
Did you take care of the MPA v2 extended connection
establishment stuff?

I'll have a look asap, I am just down with a nice
COVID infect. This to let you know I am not ignoring,
but have another interesting experience which takes
most of my time 😉. Will come back to it asap!

Bernard.


> As requested I'm only send isolated chunks for easier review.
> 
> This is the first chunk adressing deadlocks in siw_connect()
> 
> The RDMA application layer expects rdma_connect() to be non-blocking
> as the completion is handled via RDMA_CM_EVENT_ESTABLISHED and
> other async events. It's not unlikely to hold a lock during
> the rdma_connect() call.
> 
> Without out this a connection attempt to a non-existing/reachable
> server block until the very long tcp timeout hits.
> The application layer had no chance to have its own timeout handler
> as that would just deadlock with the already blocking rdma_connect().
> 
> First rdma_connect() holds id_priv->handler_mutex and deadlocks
> rdma_destroy_id().
> 
> And iw_cm_connect() called from within rdma_connect() sets
> IWCM_F_CONNECT_WAIT during the call to cm_id->device->ops.iw_connect(),
> siw_connect() in this case. It means that iw_cm_disconnect()
> and iw_destroy_cm_id() will both deadlock waiting for
> IWCM_F_CONNECT_WAIT being cleared.
> 
> Patch 1: Fixes a refcounting problem
> 
> Patches 2-7: Intruduces __siw_cep_terminate_upcall()
> making he upcall handling much more consistent handling
> more state combinations.
> 
> Patches 8-13 are preparation patches to siw_connect()
> in order to do the real non-blocking split in Patch 14.
> 
> Please have a look.
> 
> Thanks!
> metze
> 
> Fixed issues in v2:
> - Include more preparation patches related to __siw_cep_terminate_upcall()
>   bases on review from Cheng Xu <chengyou@xxxxxxxxxxxxxxxxx>
> 
> Stefan Metzmacher (14):
>   rdma/siw: remove superfluous siw_cep_put() from siw_connect() error
>     path
>   rdma/siw: make siw_cm_upcall() a noop without valid 'id'
>   rdma/siw: split out a __siw_cep_terminate_upcall() function
>   rdma/siw: use __siw_cep_terminate_upcall() for indirect
>     SIW_CM_WORK_CLOSE_LLP
>   rdma/siw: use __siw_cep_terminate_upcall() for SIW_CM_WORK_PEER_CLOSE
>   rdma/siw: use __siw_cep_terminate_upcall() for SIW_CM_WORK_MPATIMEOUT
>   rdma/siw: handle SIW_EPSTATE_CONNECTING in
>     __siw_cep_terminate_upcall()
>   rdma/siw: make use of kernel_{bind,connect,listen}()
>   rdma/siw: let siw_connect() set AWAIT_MPAREP before
>     siw_send_mpareqrep()
>   rdma/siw: create a temporary copy of private data
>   rdma/siw: use error and out logic at the end of siw_connect()
>   rdma/siw: start mpa timer before calling siw_send_mpareqrep()
>   rdma/siw: call the blocking kernel_bindconnect() just before
>     siw_send_mpareqrep()
>   rdma/siw: implement non-blocking connect.
> 
>  drivers/infiniband/sw/siw/siw_cm.c | 347 ++++++++++++++++++-----------
>  drivers/infiniband/sw/siw/siw_cm.h |   1 +
>  2 files changed, 224 insertions(+), 124 deletions(-)
> 
> --
> 2.34.1





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux