On Thu, Jul 16, 2020 at 01:41:58PM +0300, Leon Romanovsky wrote: > From: Mark Zhang <markz@xxxxxxxxxxxx> > > Made the check for duplicate/stale CM more strict by adding comparison > for remote communication ID field. > > The absence of such strict check causes to the following flows not being > handled properly: > 1. Client tries to setup more than one connection with same QP in > a server (e.g., when use external QP), the client would reject > the reply. This is correct and required behavior by IBA: 12.9.8.3.1 REQ RECEIVED / REP RECEIVED (RC, UC, XRC) A CM may receive a REQ/REP specifying a remote QPN in “REQ:local QPN”/”REP:local QPN” that the CM already considers connected to a local QP. > 2. Client node reboots, and when it gets the same QP number as that > of before the reboot time, the server would rejects the request. IBA has a specific flow that should be followed for this case: When a CM receives such a REQ/REP it shall abort the connection establishment by issuing REJ to the REQ/REP. It shall then issue DREQ, with “DREQ:remote QPN” set to the remote QPN from the REQ/REP, until DREP is received or Max Retries is exceeded, and place the local QP in the TimeWait state. The fundmental issue is IBA does not include the SRC QPN in any RC packets. It is the responsiblity of each end port to ensure that only one RC QP is setup sending traffic to a given DLID/DQPN pair. This is what the timewait mechanism and these checks in the CM are for. The proper CM use will ensure that the local QP targetting the DLID/DQPN is destroyed before the cm_id enters timedwait, and the timedwait will prevent a new QP from being established with the same parameters until the network has flushed all packets related to the old sending QP. Jason