Re: [PATCH 2/2] ksmbd: smbd: handle RDMA CM time wait event

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 6/14/2022 10:14 PM, Hyunchul Lee wrote:
2022년 6월 14일 (화) 오후 8:56, Tom Talpey <tom@xxxxxxxxxx>님이 작성:


On 6/13/2022 7:01 PM, Hyunchul Lee wrote:
After a QP has been disconnected, it stays
in a timewait state for in flight packets.
After the state has completed,
RDMA_CM_EVENT_TIMEWAIT_EXIT is reported.
Disconnect on RDMA_CM_EVENT_TIMEWAIT_EXIT
so that ksmbd can restart.

Signed-off-by: Hyunchul Lee <hyc.lee@xxxxxxxxx>
---
   fs/ksmbd/transport_rdma.c | 1 +
   1 file changed, 1 insertion(+)

diff --git a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c
index d035e060c2f0..4b1a471afcd0 100644
--- a/fs/ksmbd/transport_rdma.c
+++ b/fs/ksmbd/transport_rdma.c
@@ -1535,6 +1535,7 @@ static int smb_direct_cm_handler(struct rdma_cm_id *cm_id,
               wake_up_interruptible(&t->wait_status);
               break;
       }
+     case RDMA_CM_EVENT_TIMEWAIT_EXIT:
       case RDMA_CM_EVENT_DEVICE_REMOVAL:
       case RDMA_CM_EVENT_DISCONNECTED: {
               t->status = SMB_DIRECT_CS_DISCONNECTED;

Is this issue seen on all RDMA providers? Because I would normally
expect that an RDMA_CM_EVENT_DISCONNECTED will precede the TIMEWAIT
event. What scenarios have you seen this not occur?


There was an issue that ksmbd got stuck after attempting to shutdown.
We are trying to reproduce it, but we haven't reproduced it yet,
but It seems to be related to the TIMEWAIT event.

I don't think it's appropriate to add this case to SMB. I think it's
quite unlikely that it will address anything, because an RDMA provider
must have indicated a CM_EVENT_DISCONNECTED prior to any TIMEWAIT.
So, the QP (and connection) will already have been torn down by ksmbd
at the earlier event. Perhaps ksmbd did not properly drain the QP at
the initial disconnect.

> And other drivers such as nvme have disconnected on the TIMEWAIT event.

NVME is a completely different upper layer, and has different client/
server transport behavior. The SMB session insulates its peers from
most transport errors, and should not be requesting timewait for
its connections, and definitely not waiting for timewait to expire
before initiating teardown (or recovery). The NFS/RDMA client and
server ignore this event, btw.

Unless ksmbd wishes to reuse its QP's, which is not currently the
case (right?), there's pretty much no reason to manage QP state and
hang around for TIMEWAIT.

Right, ksmbd doesn't reuse QP.

Then there appears to be no good justification for the change. Sorry,
but it's a NAK from me.

Tom.



[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux