RE: 【Question for IPv6 and RoCEv1】

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Oulijun,

> -----Original Message-----
> From: oulijun <oulijun@xxxxxxxxxx>
> Sent: Friday, November 9, 2018 2:57 AM
> To: Jason Gunthorpe <jgg@xxxxxxxxxxxx>; Parav Pandit
> <parav@xxxxxxxxxxxx>
> Cc: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx>
> Subject: 【Question for IPv6 and RoCEv1】
> 
> Hi, Parav Pandit&Jason Gunthorpe
> 
>    I am testing the RoCE using RoCEv1 and IPv6 based on the two hip08
> environment.
> 
>    the operations as follows:
>    hip08 #1: firstly, I add a vlan device, the cmd is
> 		vconfig add eth0 100,
> 		ifconfig eth.100 193.168.1.1
>              secondly, ifconfig the IPv6 address, the cmd is
> 		ifconfig eth0.100 add fe80::189e:4bff:fe42:2965/64
> 
>    hip08 #2: I run the same operation:
> 		firstly, vconfig add eth0 100
> 		ifconfig eth.100 193.168.1.2
>  		secondly, ifconfig ifconfig eth0.100 add
> fe80::1000:22ff:fe10:5923/64
> 
>     next, I use perftest to test the RoCE, it is fail.
>     server端执行:./ib_send_bw -n 5 -x 6 &
>     client端执行:./ib_send_bw -n 5 -x 6 193.168.1.1 &
> 
>     I analysis the process flow and have a question. The reason for the failure
> is that get the wrong dmac when modify qp.
> 
>     I trace the following code
>     static int ib_resolve_unicast_gid_dmac(struct ib_device *device,
> 				       struct rdma_ah_attr *ah_attr)
>    {
> 	struct ib_global_route *grh = rdma_ah_retrieve_grh(ah_attr);
> 	const struct ib_gid_attr *sgid_attr = grh->sgid_attr;
> 	int hop_limit = 0xff;
> 	int ret = 0;
> 
> 	/* If destination is link local and source GID is RoCEv1,
> 	 * IP stack is not used.
> 	 */
> 	if (rdma_link_local_addr((struct in6_addr *)grh->dgid.raw) &&
> 	    sgid_attr->gid_type == IB_GID_TYPE_ROCE) {
> 		rdma_get_ll_mac((struct in6_addr *)grh->dgid.raw,
> 				ah_attr->roce.dmac);
> 		return ret;
> 	}
> 
> 	ret = rdma_addr_find_l2_eth_by_grh(&sgid_attr->gid, &grh->dgid,
> 					   ah_attr->roce.dmac,
> 					   sgid_attr, &hop_limit);
> 
> 	grh->hop_limit = hop_limit;
> 	return ret;
> }
> 
> when configure the addr->s6_addr32[0] for 0xfe80, the dmac get from grh-
> >dgid.raw.
> However, the grh->dgid.raw is converted from ip address.
> 
Yes. This is incorrect. There are few wrong things with regards to default GID, RoCEv1 and destination resolution.

1. RoCE default GIDs are constructed out of mac address and not from the port GUID. (spec violation section 3.5.10)
2. two default GIDs are constructed v1 and v2 for those HCAs which support it.
Nothing wrong there, but as side effect, RoCE Annex standard A16.5.1 cannot be followed ("resolving destination by standard ARP or ND").
3. In order to adhere to A 16.5.1 (that is - to always call rdma_addr_find_l2_eth_by_grh(), regardless of GID type), GIDs has to be based on IP addresses.
This contradicts with the base IB spec to have GID based out of port_GUID.
Doing so will also make IB stack rely on IPv6 functionality, which is not a problem, but then default GID has to be constructed out of IP address and not mac address.
I have seen Ubuntu platform where link local IPv6 address by default is not a function of MAC address.

Fix require,
(a) disable two default GIDs and just have one default RoCEv1 GID at index 0 for legacy, to be created out of port GUID (not MAC)
(b) Keep RoCEv2 based default GID slot empty so that some wrong legacy application who has made GID index assumptions, can still work from other GID indices
(c) Let IPv6 RoCEv2 GID get added based on IP address scheme at index other than 1
(d) Always call rdma_addr_find_l2_eth_by_grh() and do not check for GID type as v1/v2
(e) Do not allow traffic to RoCEv1 default GID (continue to allow on IB default GID for IB ports)

This will work in sane manner and it will resolve the problem you describe.
However, it breaks some ABI, because of (c), GID table will look different than today; and I hear resistance to not do that, hence I left it in current state.

In other alternative, Jason suggested to not construct RoCEv1 GIDs based on IP addresses. But hns driver uses it and if I recall correctly RoCEv2 support was added lately, so some deployments might be still relying on RoCEv1.
I am not sure, so I was reluctant to make this change.
So it was decided to disable it via netlink command at user's will where he would know about table changes.
I do have code to disable RoCEv1 GIDs, but there are issues in enabling them back, so I hold off.

> Why? Does it not allow users to configure ipv6 at will?
>
Yes, due to above issues, RoCEv1 GID are ill.
If HCA supports RoCEv2, please use RoCEv2 with IPv6. It will work as expected.




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux