在 2018/11/9 21:08, Parav Pandit 写道: > Hi Oulijun, > >> -----Original Message----- >> From: oulijun <oulijun@xxxxxxxxxx> >> Sent: Friday, November 9, 2018 2:57 AM >> To: Jason Gunthorpe <jgg@xxxxxxxxxxxx>; Parav Pandit >> <parav@xxxxxxxxxxxx> >> Cc: linux-rdma <linux-rdma@xxxxxxxxxxxxxxx> >> Subject: 【Question for IPv6 and RoCEv1】 >> >> Hi, Parav Pandit&Jason Gunthorpe >> >> I am testing the RoCE using RoCEv1 and IPv6 based on the two hip08 >> environment. >> >> the operations as follows: >> hip08 #1: firstly, I add a vlan device, the cmd is >> vconfig add eth0 100, >> ifconfig eth.100 193.168.1.1 >> secondly, ifconfig the IPv6 address, the cmd is >> ifconfig eth0.100 add fe80::189e:4bff:fe42:2965/64 >> >> hip08 #2: I run the same operation: >> firstly, vconfig add eth0 100 >> ifconfig eth.100 193.168.1.2 >> secondly, ifconfig ifconfig eth0.100 add >> fe80::1000:22ff:fe10:5923/64 >> >> next, I use perftest to test the RoCE, it is fail. >> server端执行:./ib_send_bw -n 5 -x 6 & >> client端执行:./ib_send_bw -n 5 -x 6 193.168.1.1 & >> >> I analysis the process flow and have a question. The reason for the failure >> is that get the wrong dmac when modify qp. >> >> I trace the following code >> static int ib_resolve_unicast_gid_dmac(struct ib_device *device, >> struct rdma_ah_attr *ah_attr) >> { >> struct ib_global_route *grh = rdma_ah_retrieve_grh(ah_attr); >> const struct ib_gid_attr *sgid_attr = grh->sgid_attr; >> int hop_limit = 0xff; >> int ret = 0; >> >> /* If destination is link local and source GID is RoCEv1, >> * IP stack is not used. >> */ >> if (rdma_link_local_addr((struct in6_addr *)grh->dgid.raw) && >> sgid_attr->gid_type == IB_GID_TYPE_ROCE) { >> rdma_get_ll_mac((struct in6_addr *)grh->dgid.raw, >> ah_attr->roce.dmac); >> return ret; >> } >> >> ret = rdma_addr_find_l2_eth_by_grh(&sgid_attr->gid, &grh->dgid, >> ah_attr->roce.dmac, >> sgid_attr, &hop_limit); >> >> grh->hop_limit = hop_limit; >> return ret; >> } >> >> when configure the addr->s6_addr32[0] for 0xfe80, the dmac get from grh- >>> dgid.raw. >> However, the grh->dgid.raw is converted from ip address. >> > Yes. This is incorrect. There are few wrong things with regards to default GID, RoCEv1 and destination resolution. > > 1. RoCE default GIDs are constructed out of mac address and not from the port GUID. (spec violation section 3.5.10) > 2. two default GIDs are constructed v1 and v2 for those HCAs which support it. > Nothing wrong there, but as side effect, RoCE Annex standard A16.5.1 cannot be followed ("resolving destination by standard ARP or ND"). > 3. In order to adhere to A 16.5.1 (that is - to always call rdma_addr_find_l2_eth_by_grh(), regardless of GID type), GIDs has to be based on IP addresses. > This contradicts with the base IB spec to have GID based out of port_GUID. > Doing so will also make IB stack rely on IPv6 functionality, which is not a problem, but then default GID has to be constructed out of IP address and not mac address. > I have seen Ubuntu platform where link local IPv6 address by default is not a function of MAC address. > > Fix require, > (a) disable two default GIDs and just have one default RoCEv1 GID at index 0 for legacy, to be created out of port GUID (not MAC) > (b) Keep RoCEv2 based default GID slot empty so that some wrong legacy application who has made GID index assumptions, can still work from other GID indices > (c) Let IPv6 RoCEv2 GID get added based on IP address scheme at index other than 1 > (d) Always call rdma_addr_find_l2_eth_by_grh() and do not check for GID type as v1/v2 > (e) Do not allow traffic to RoCEv1 default GID (continue to allow on IB default GID for IB ports) > > This will work in sane manner and it will resolve the problem you describe. > However, it breaks some ABI, because of (c), GID table will look different than today; and I hear resistance to not do that, hence I left it in current state. > > In other alternative, Jason suggested to not construct RoCEv1 GIDs based on IP addresses. But hns driver uses it and if I recall correctly RoCEv2 support was added lately, so some deployments might be still relying on RoCEv1. > I am not sure, so I was reluctant to make this change. Yes, you are right. > So it was decided to disable it via netlink command at user's will where he would know about table changes. > I do have code to disable RoCEv1 GIDs, but there are issues in enabling them back, so I hold off. > >> Why? Does it not allow users to configure ipv6 at will? >> > Yes, due to above issues, RoCEv1 GID are ill. > If HCA supports RoCEv2, please use RoCEv2 with IPv6. It will work as expected. Thanks, I see.