On 6/20/2017 8:07 AM, Håkon Bugge wrote: > CM REQs cannot be successfully retried, because a new pv_cm_id is > created for each request, without checking if one already exists. > > By checking if an id exists before creating one, the bug is fixed. > > This bug can be provoked by running an RDMA CM user-land application, > but inserting a five seconds delay before the rdma_accept() call on > the passive side. This delay is larger than the default CMA timeout, > and triggers a retry from the active side. The retried REQ will use > another pv_cm_id (the cm_id on the wire). This confuses the CM > protocol and two REJs are sent from the passive side. > > Here is an excerpt from ibdump running without the patch: > > 3.285092 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) > 7.382711 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) > 7.382861 LID: 4 -> LID: 4 InfiniBand 290 CM: ConnectReject > 7.387644 LID: 4 -> LID: 4 InfiniBand 290 CM: ConnectReject > > and here is the same with bug fix applied: > > 3.251010 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) > 7.349387 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) > 8.258443 LID: 4 -> LID: 4 SDP 290 CM: ConnectReply(SDP Hello) > 8.259890 LID: 4 -> LID: 4 InfiniBand 290 CM: ReadyToUse > > Suggested-by: Venkat Venkatsubra <venkat.x.venkatsubra@xxxxxxxxxx> > Signed-off-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx> > Reported-by: Wei Lin Guay <wei.lin.guay@xxxxxxxxxx> > Tested-by: Wei Lin Guay <wei.lin.guay@xxxxxxxxxx> > Reviewed-by: Yuval Shaia <yuval.shaia@xxxxxxxxxx> This was accepted into 4.13-rc, thanks. -- Doug Ledford <dledford@xxxxxxxxxx> GPG Key ID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Attachment:
signature.asc
Description: OpenPGP digital signature