Re: 【Question for IPv6 and RoCEv1】

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2018/11/9 21:08, Parav Pandit 写道:
> Hi Oulijun,
>
>> -----Original Message-----
>> From: oulijun <oulijun@xxxxxxxxxx>
>> Sent: Friday, November 9, 2018 2:57 AM
>> To: Jason Gunthorpe <jgg@xxxxxxxxxxxx>; Parav Pandit
>> <parav@xxxxxxxxxxxx>
>> Cc: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx>
>> Subject: 【Question for IPv6 and RoCEv1】
>>
>> Hi, Parav Pandit&Jason Gunthorpe
>>
>>    I am testing the RoCE using RoCEv1 and IPv6 based on the two hip08
>> environment.
>>
>>    the operations as follows:
>>    hip08 #1: firstly, I add a vlan device, the cmd is
>> 		vconfig add eth0 100,
>> 		ifconfig eth.100 193.168.1.1
>>              secondly, ifconfig the IPv6 address, the cmd is
>> 		ifconfig eth0.100 add fe80::189e:4bff:fe42:2965/64
>>
>>    hip08 #2: I run the same operation:
>> 		firstly, vconfig add eth0 100
>> 		ifconfig eth.100 193.168.1.2
>>  		secondly, ifconfig ifconfig eth0.100 add
>> fe80::1000:22ff:fe10:5923/64
>>
>>     next, I use perftest to test the RoCE, it is fail.
>>     server端执行:./ib_send_bw -n 5 -x 6 &
>>     client端执行:./ib_send_bw -n 5 -x 6 193.168.1.1 &
>>
>>     I analysis the process flow and have a question. The reason for the failure
>> is that get the wrong dmac when modify qp.
>>
>>     I trace the following code
>>     static int ib_resolve_unicast_gid_dmac(struct ib_device *device,
>> 				       struct rdma_ah_attr *ah_attr)
>>    {
>> 	struct ib_global_route *grh = rdma_ah_retrieve_grh(ah_attr);
>> 	const struct ib_gid_attr *sgid_attr = grh->sgid_attr;
>> 	int hop_limit = 0xff;
>> 	int ret = 0;
>>
>> 	/* If destination is link local and source GID is RoCEv1,
>> 	 * IP stack is not used.
>> 	 */
>> 	if (rdma_link_local_addr((struct in6_addr *)grh->dgid.raw) &&
>> 	    sgid_attr->gid_type == IB_GID_TYPE_ROCE) {
>> 		rdma_get_ll_mac((struct in6_addr *)grh->dgid.raw,
>> 				ah_attr->roce.dmac);
>> 		return ret;
>> 	}
>>
>> 	ret = rdma_addr_find_l2_eth_by_grh(&sgid_attr->gid, &grh->dgid,
>> 					   ah_attr->roce.dmac,
>> 					   sgid_attr, &hop_limit);
>>
>> 	grh->hop_limit = hop_limit;
>> 	return ret;
>> }
>>
>> when configure the addr->s6_addr32[0] for 0xfe80, the dmac get from grh-
>>> dgid.raw.
>> However, the grh->dgid.raw is converted from ip address.
>>
> Yes. This is incorrect. There are few wrong things with regards to default GID, RoCEv1 and destination resolution.
>
> 1. RoCE default GIDs are constructed out of mac address and not from the port GUID. (spec violation section 3.5.10)
> 2. two default GIDs are constructed v1 and v2 for those HCAs which support it.
> Nothing wrong there, but as side effect, RoCE Annex standard A16.5.1 cannot be followed ("resolving destination by standard ARP or ND").
> 3. In order to adhere to A 16.5.1 (that is - to always call rdma_addr_find_l2_eth_by_grh(), regardless of GID type), GIDs has to be based on IP addresses.
> This contradicts with the base IB spec to have GID based out of port_GUID.
> Doing so will also make IB stack rely on IPv6 functionality, which is not a problem, but then default GID has to be constructed out of IP address and not mac address.
> I have seen Ubuntu platform where link local IPv6 address by default is not a function of MAC address.
>
> Fix require,
> (a) disable two default GIDs and just have one default RoCEv1 GID at index 0 for legacy, to be created out of port GUID (not MAC)
> (b) Keep RoCEv2 based default GID slot empty so that some wrong legacy application who has made GID index assumptions, can still work from other GID indices
> (c) Let IPv6 RoCEv2 GID get added based on IP address scheme at index other than 1
> (d) Always call rdma_addr_find_l2_eth_by_grh() and do not check for GID type as v1/v2
> (e) Do not allow traffic to RoCEv1 default GID (continue to allow on IB default GID for IB ports)
>
> This will work in sane manner and it will resolve the problem you describe.
> However, it breaks some ABI, because of (c), GID table will look different than today; and I hear resistance to not do that, hence I left it in current state.
>
> In other alternative, Jason suggested to not construct RoCEv1 GIDs based on IP addresses. But hns driver uses it and if I recall correctly RoCEv2 support was added lately, so some deployments might be still relying on RoCEv1.
> I am not sure, so I was reluctant to make this change.
Yes, you are right.
> So it was decided to disable it via netlink command at user's will where he would know about table changes.
> I do have code to disable RoCEv1 GIDs, but there are issues in enabling them back, so I hold off.
>
>> Why? Does it not allow users to configure ipv6 at will?
>>
> Yes, due to above issues, RoCEv1 GID are ill.
> If HCA supports RoCEv2, please use RoCEv2 with IPv6. It will work as expected.

Thanks, I see.





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux