Hi Devesh, > -----Original Message----- > From: Devesh Sharma [mailto:devesh.sharma@xxxxxxxxxxxx] > Sent: Wednesday, October 18, 2017 11:32 PM > To: Parav Pandit <parav@xxxxxxxxxxxx> > Cc: Selvin Xavier <selvin.xavier@xxxxxxxxxxxx>; Somnath Kotur > <somnath.kotur@xxxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; Matan Barak > <matanb@xxxxxxxxxxxx>; Jason Gunthorpe > <jgunthorpe@xxxxxxxxxxxxxxxxxxxx>; Roland Dreier > <roland@xxxxxxxxxxxxxxx> > Subject: Re: Need to set if_index in ib_init_ah_from_wc() ? > > Hello Parav, > > Thanks for you note here. I can supply you a fix, but before that please find a > query inline below: That will be good. More inline below. > > On Thu, Oct 19, 2017 at 12:30 AM, Parav Pandit <parav@xxxxxxxxxxxx> wrote: > > Hi Devesh, Selvin, Somnath, > > > >> -----Original Message----- > >> From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma- > >> owner@xxxxxxxxxxxxxxx] On Behalf Of Jason Gunthorpe > >> Sent: Monday, January 30, 2017 1:54 PM > >> To: Roland Dreier <roland@xxxxxxxxxxxxxxx> > >> Cc: linux-rdma@xxxxxxxxxxxxxxx; Matan Barak <matanb@xxxxxxxxxxxx> > >> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ? > >> > >> On Sun, Jan 29, 2017 at 05:21:43PM -0800, Roland Dreier wrote: > >> > I recently debugged an issue where an RDMA-CM / RoCE connection to > >> > a link-local IPv6 address takes 1 second to establish. The delay > >> > comes when the passive side cm_req_handler() calls > >> > cm_init_av_for_response(), which calls ib_init_ah_from_wc(). > >> > > >> > That calls into rdma_addr_find_l2_eth_by_grh() with the link-local > >> > address, but with 0 passed in through the if_index parameter. So > >> > > >> > The fix seems as simple as adding > >> > > >> > if_index = idev->ifindex; > >> > > >> > to ib_init_ah_from_wc() before the call to > >> > rdma_addr_find_l2_eth_by_grh() - this can't fail, since we return > >> > an error if finding idev fails. > >> > >> Hum, it is certainly a big conceptual mistake to not anchor a route > >> lookup in a netdev at this point. > >> > >> Both roceev1 and v2 need to use the netdev that was used to create > >> the UD QP to anchor these lookups, that netdev should be a child of > >> some kind from the get_netdev (eg a vlan, or macvlan, etc) > >> > >> This is necessary to make namespaces and other subtle things work properly. > >> > >> So, right idea, wrong netdev. > >> > >> > However, I'm not sure if it's quite that simple - how does this > >> > interact with RoCEv2, or other addressing like IPv4 or > >> > non-link-local IPv6? I'm not sure I understand what the cases where > >> > resolved_dev will be != idev, or if passing in idev->ifindex will break this. > >> > >> I think this brokenness fell out of the above confusion, it should be > >> removed, rdma_addr_find_dmac_by_grh should not be callable without a > proper netdev. > >> > >> The != case can happen if the gid table resolve differently than the > >> route table, but it is conceptually wrong to lookup in the gid table > >> before doing a route lookup, and I think these apis are poorly designed to > enforce that requirement. > >> > >> Jason > > > > I have fixed this issue where rdma_addr_find_l2_eth_by_grh() accepts netdev > of the GID. > > Derivation of GID is based on vlan, and DGID of the incoming GRH, > network_type fields. > > Finding out the right GID depends above fields from the cached gid table. > > > > Before I post the patch, > > We found that bnx_re driver doesn't set IB_WC_WITH_VLAN in the QP1 wc. > > Fail to set vlan id and IB_WC_WITH_VLAN flag when incoming QP1 packet in > vlan, will fail to search the right GID. > > I am under impression that IB_WC_WITH_VLAN flags was supposed to be set > only if any vendor's WC reports VLAN-ID, otherwise it had to be derived from > somewhere else. Are we moving away from this notion? Yes. That is correct. Its optional. However vlan defines L2 segments. Two L2 segments can have same IP address and therefore GIDs. So Ignoring vlan is not good because ib_find_gid_index and friend functions won't be able to resolve them, which performs vlan + SGID exact lookup. As discussed in past thread, we like to find the matching SGID first based on GRH + available L2 field such as vlan, and refer to the ndev of the SGID. So my upcoming patch brings such fundamental fix that resolves IPv6 link local issue and few other similar cases. ��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f