Re: [PATCH v2 3/4] ibacm: Unable to resurrect an interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 29, 2019 at 09:44:05AM +0100, Håkon Bugge wrote:
> When an IB port has been brought back to Active state, after being
> down, ibacm gets an event about it. It will then (re) enumerate the
> devices, and does so by executing an ioctl with SIOCGIFCONF. This
> particular ioctl will only return interfaces that are "running".
> 
> There may be a delay after the IB port becomes Active until its
> address has been provisioned, and becomes "running". If ibacm attempts
> to associate IPoIB interfaces to the port during this interval, it
> will not see the interface because it is not "running".
> 
> Later, when ibacm is asked for a Path Record (PR) using the IP address
> of the resurrected IPoIB interface, it will not be able to find the
> associated end-point (EP), and the following is printed in the log:
> 
> acm_svr_resolve_path: notice - unknown local end point address
> 
> The bug can be provoked by the following script. We have a single HCA
> with two ports, the IPoIB interfaces are named stib{0,1}, the IP
> address of the first interface is 192.168.200.200, and the remote IP
> address is 192.168.200.202. The LID of the IB switch is 1 and the
> switch port number connected to port 1 of the HCA is 22.
> 
> <script>
>  #!/bin/bash
> 
> ibportstate -P 2 1 22 disable
> 
>  # move the IP address
> ip addr del 192.168.200.200/24 dev stib0
> ip addr add 192.168.200.200/24 dev stib1
> 
> ibportstate -P 2 1 22 enable
> 
>  # Wait until port becomes active again
> while [[ $(ibstat|grep State:|grep -c Active) != 2 ]]; do
>    echo -n "."; sleep 1
> done
> echo
> 
>  # give ibacm time to re-enumerate the interfaces
> sleep 1
> 
>  # move the IP address back
> ip addr del 192.168.200.200/24 dev stib1
> ip addr add 192.168.200.200/24 dev stib0
> 
>  # take the other port down, so we are sure we use stib0
> ip link set down dev stib1
> 
>  # Start a utility requesting a PR
> qperf 192.168.200.202 -cm1 rc_bw
> 
>  # check for failure
> grep "unknown local end point" ibacm.log
> 
>  # restore the state
> ip link set up dev stib1
> </script>
> 
> This fix depends on the commit that re-factors the use of ioctl in
> acm_if_iter(), and instead uses netlink. Now, by reducing the
> requirements of the state of the interface, the EP is added, and
> afterwards, when an address is assigned, it is associated with the EP.
> 
> This commit is a new implementation of
> https://patchwork.kernel.org/patch/10748357, which was NAKed.
> 
> Signed-off-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx>

Reviewed-by: Ira Weiny <ira.weiny@xxxxxxxxx>

> 
> v1 -> v2:
>    * Removed Gerrit's Change-Id tag (Håkon)
> ---
>  ibacm/src/acm_util.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/ibacm/src/acm_util.c b/ibacm/src/acm_util.c
> index fecb6c89..8807579d 100644
> --- a/ibacm/src/acm_util.c
> +++ b/ibacm/src/acm_util.c
> @@ -180,19 +180,13 @@ static void acm_if_iter(struct nl_object *obj, void *_ctx_and_cb)
>  	uint16_t pkey;
>  	int addr_len;
>  	char *label;
> -	int flags;
> -	int ret;
>  	int af;
>  
>  	link = rtnl_link_get(link_cache, rtnl_addr_get_ifindex(addr));
> -	flags = rtnl_link_get_flags(link);
>  
>  	if (rtnl_link_get_arptype(link) != ARPHRD_INFINIBAND)
>  		return;
>  
> -	if (!(flags & IFF_RUNNING))
> -		return;
> -
>  	if (!(a = rtnl_addr_get_local(addr)))
>  		return;
>  
> @@ -206,20 +200,18 @@ static void acm_if_iter(struct nl_object *obj, void *_ctx_and_cb)
>  		return;
>  
>  	label = rtnl_addr_get_label(addr);
> -	if (!label)
> -		return;
>  
>  	link_addr = rtnl_link_get_addr(link);
> +	/* gid has a 4 byte offset into the link address */
>  	memcpy(sgid.raw, nl_addr_get_binary_addr(link_addr) + 4, sizeof(sgid));
>  
> -	ret = acm_if_get_pkey(rtnl_link_get_name(link), &pkey);
> -	if (ret)
> +	if (acm_if_get_pkey(rtnl_link_get_name(link), &pkey))
>  		return;
>  
>  	acm_log(2, "name: %5s label: %9s index: %2d flags: %s addr: %s pkey: 0x%04x guid: 0x%lx\n",
>  		rtnl_link_get_name(link), label,
>  		rtnl_addr_get_ifindex(addr),
> -		rtnl_link_flags2str(flags, flags_str, sizeof(flags_str)),
> +		rtnl_link_flags2str(rtnl_link_get_flags(link), flags_str, sizeof(flags_str)),
>  		nl_addr2str(a, ip_str, sizeof(ip_str)),	pkey,
>  		be64toh(sgid.global.interface_id));
>  
> -- 
> 2.20.1
> 



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux