On 1/15/2019 3:53 AM, Alex Rosenbaum wrote:
RoCE v2.0 UDP source port is used to provide per-conversation entropy for Network Routers (ECMP), load balancers and 802.3ad Link Aggregation Switching that are not aware of RoCE headers. We will use the following hash functions definition to calculate the RoCE UDP src_port according to the service type (QP Type) and Service ID (SID), or according to the source and destination QPN values where CM services are not used.
Won't this result in the source port being the same for all connections to a given remote service? In other words, all NFS/RDMA connections to a given server will have the same src_port, and therefore will all stack up in the same router queues? Similarly for SMB, iSER, etc.
UDP source port selection must adhere IANA port allocation ranges. Thus we will be using IANA recommendation for Ephemeral port range of: 49152-65535, or in hex: 0xC000-0xFFFF.
I thought this was already true. In any case, definitely yes because firewalls will potentially block arbitrary traffic. Tom.
As this RFC does not modify any wire format nor does it change any protocol, I don't think it requires IBTA specification. The below calculations take into account the importance of producing a symmetric hash result so we can support symmetric hash calculation of network elements. Hash Calculation for RDMA IP CM Service ======================================= For RDMA IP CM Services, based on QP1 usage and connected RC QPs using the RDMA IP CM SID, the RoCEv2 UDP source port will be calculated according to IBTA CM REQ private data info and Service ID. Hash Function Calculations: Extract the following fields from the CM IP REQ: CM_REQ.ServiceID.DstPort [2 Bytes] CM_REQ.PrivateData.SrcPort [2 Bytes] RoCE.UDP.src_port = (DstPort[0..1] XOR SrcPort[0..1]) OR 0xC000 Result of the above hash will be kept in the CM connection context and will be used all across his vitality for all preceding CM messages on both ends of the connection (including REP, REJ, DREQ, DREP, ..). Once connection is established, the corresponding Connected RC QPs on both ends of the connection will update their context with the calculated RDMA IP CM Service based UDP src_port hash at the Connect phase of the active side and Accept phase of the passive side of the connection. CM will provide to the calculated value of the hash RoCEv2 UDP src_port (16 bit) result in the 'uint16_t dlid' field of 'struct ibv_ah_attr'. The 'struct ibv_ah_attr' is passed to the provider library when calling ibv_modify_qp(qp, ah_attr, attr_mask |= IBV_QP_AV), so that providers supporting RoCEv2 will use the 'dlid' value to update the RoCEv2 UDP src_port in the QP context. Hash Calculation for non-RDMA CM Service ID =========================================== For non CM QP's we'll use input arguments: src.QP[24bit], dst.QP[24bit]. As QP[] is an array of 3 bytes, we'll define a 'Folded QP' value of: For src.QP and dst.QP perform: QP[0] ^= QP[2]; Folded.QP now resides in QP[0..15]. Desired hash functions is: UDP src_port = (s.QP[0..15] XOR d.QP[0..15]) OR 0xC000; RoCEv2 UDP source port set scenario is dependent on QP type: For RC QP: Set on QP creation in QP Context through the provider library. If (s.QP != d.QP) QPC.entropy = Folded.s.QP XOR Folded.d.QP OR 0xC000 Else QPC.entropy = Folded.s.QP OR 0xC000 For UD QP: Set per post send scenario by the provider library into the WQ Element. If (s.QP != d.QP) && (d.QP != 0xFFFFFF) WQE.entropy = Folded.s.QP XOR Folded.d.QP AND 0xC000 Else WQE.entropy = Folded.s.QP AND 0xC000 Based on feedback and agreement we will submit patches to Kernel and rdma_core for review on the ML. Signed-off-by: Alex Rosenbaum <alexr@xxxxxxxxxxxx>