Yes, in this case we are getting CLIENT_REREGISTER event and will do
light flush, which marks all path records as invalid and reconnects all
multicast groups. I think it would be too much to do a normal flush, as
all interfaces in the network will go down on a simple subnet manager
change.
Light flush seems to be suitable for this case, the problem is that
validity of path records is checked only on outgoing unicast ARPs. My
impression is that correct solution that also covers UD case is to check
validity also on outgoing ICMP6 Neighbor Advertisement packets. I can
try to prepare a patch that implements that.
This patch, however, still looks useful to me, as checking path on neigh
creation allows to detect such situations in connected mode faster, not
waiting for ARP cache expiry time.
On 09.05.2018 16:31, Doug Ledford wrote:
On Fri, 2018-05-04 at 13:49 +0200, Evgenii Smirnov wrote:
Currently, the validity of a path is checked only on
unicast ARP transmission. If Subnet Manager switchover happens and
some LIDs get reassigned, driver in a network that uses only IPv6
addresses will not try to renew the path records, despite them
being marked as invalid.
Are we not getting a flush event in this case? If we are getting a
flush event, maybe we just aren't doing a heavy enough flush?
In general I don't have a problem for this patch, but I would prefer to
find a solution that resolves the UD case too, and maybe that just needs
to flush harder on the specific event we get when we get a new SM (it's
a rereg event, yes?).
In connected mode, remote side LID change will cause send to fail,
freeing the corresponding neigh struct. Subsequent packets to this
destination will trigger allocation of a new neigh struct.
With this patch allocation of new neigh struct will also check the
validity of the associated path and renew it if necessary.
This, however, will not help in datagram mode, if the host
continuously sends data to the destination with invalid path.
The neigh struct alive timer will be updated, thus preventing
it from reallocation.
Test setup consists of two target hosts and two initiator hosts,
one of the initiators is with the patch. All hosts have only IPv6
addresses from the same subnet and initiators constantly ping targets.
In connected mode swapping the LIDs of target hosts and switching over SM
leads to the loss of connectivity for the initiator without the patch.
Initiator with the patch recovers in ~3 sec. In datagram mode initiator
with the patch is able to recover only if ping is stopped for
neigh_obsolete time.
Signed-off-by: Evgenii Smirnov<evgenii.smirnov@xxxxxxxxxxxxxxxx>
---
drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 161ba8c76285..db5762d62aea 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -963,7 +963,7 @@ static struct ipoib_neigh *neigh_add_path(struct sk_buff *skb, u8 *daddr,
list_add_tail(&neigh->list, &path->neigh_list);
- if (path->ah) {
+ if (path->ah && path->valid) {
kref_get(&path->ah->ref);
neigh->ah = path->ah;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html