2022년 6월 16일 (목) 오전 3:53, Tom Talpey <tom@xxxxxxxxxx>님이 작성: > > > On 6/14/2022 10:14 PM, Hyunchul Lee wrote: > > 2022년 6월 14일 (화) 오후 8:56, Tom Talpey <tom@xxxxxxxxxx>님이 작성: > >> > >> > >> On 6/13/2022 7:01 PM, Hyunchul Lee wrote: > >>> After a QP has been disconnected, it stays > >>> in a timewait state for in flight packets. > >>> After the state has completed, > >>> RDMA_CM_EVENT_TIMEWAIT_EXIT is reported. > >>> Disconnect on RDMA_CM_EVENT_TIMEWAIT_EXIT > >>> so that ksmbd can restart. > >>> > >>> Signed-off-by: Hyunchul Lee <hyc.lee@xxxxxxxxx> > >>> --- > >>> fs/ksmbd/transport_rdma.c | 1 + > >>> 1 file changed, 1 insertion(+) > >>> > >>> diff --git a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c > >>> index d035e060c2f0..4b1a471afcd0 100644 > >>> --- a/fs/ksmbd/transport_rdma.c > >>> +++ b/fs/ksmbd/transport_rdma.c > >>> @@ -1535,6 +1535,7 @@ static int smb_direct_cm_handler(struct rdma_cm_id *cm_id, > >>> wake_up_interruptible(&t->wait_status); > >>> break; > >>> } > >>> + case RDMA_CM_EVENT_TIMEWAIT_EXIT: > >>> case RDMA_CM_EVENT_DEVICE_REMOVAL: > >>> case RDMA_CM_EVENT_DISCONNECTED: { > >>> t->status = SMB_DIRECT_CS_DISCONNECTED; > >> > >> Is this issue seen on all RDMA providers? Because I would normally > >> expect that an RDMA_CM_EVENT_DISCONNECTED will precede the TIMEWAIT > >> event. What scenarios have you seen this not occur? > >> > > > > There was an issue that ksmbd got stuck after attempting to shutdown. > > We are trying to reproduce it, but we haven't reproduced it yet, > > but It seems to be related to the TIMEWAIT event. > > I don't think it's appropriate to add this case to SMB. I think it's > quite unlikely that it will address anything, because an RDMA provider > must have indicated a CM_EVENT_DISCONNECTED prior to any TIMEWAIT. > So, the QP (and connection) will already have been torn down by ksmbd > at the earlier event. Perhaps ksmbd did not properly drain the QP at > the initial disconnect. > > > And other drivers such as nvme have disconnected on the TIMEWAIT event. > > NVME is a completely different upper layer, and has different client/ > server transport behavior. The SMB session insulates its peers from > most transport errors, and should not be requesting timewait for > its connections, and definitely not waiting for timewait to expire > before initiating teardown (or recovery). The NFS/RDMA client and > server ignore this event, btw. > Okay, I got it. I am looking for the cause and have found some clues. > >> Unless ksmbd wishes to reuse its QP's, which is not currently the > >> case (right?), there's pretty much no reason to manage QP state and > >> hang around for TIMEWAIT. > > > > Right, ksmbd doesn't reuse QP. > > Then there appears to be no good justification for the change. Sorry, > but it's a NAK from me. > Really thank you for the detailed explanation. > Tom. -- Thanks, Hyunchul