Patch "net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group" has been added to the 5.4-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group

to the 5.4-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     net-nexthop-release-ipv6-per-cpu-dsts-when-replacing.patch
and it can be found in the queue-5.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 15aeda700c2d6d67593be734a979dc005a399cc4
Author: Nikolay Aleksandrov <nikolay@xxxxxxxxxx>
Date:   Mon Nov 22 17:15:13 2021 +0200

    net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group
    
    [ Upstream commit 1005f19b9357b81aa64e1decd08d6e332caaa284 ]
    
    When replacing a nexthop group, we must release the IPv6 per-cpu dsts of
    the removed nexthop entries after an RCU grace period because they
    contain references to the nexthop's net device and to the fib6 info.
    With specific series of events[1] we can reach net device refcount
    imbalance which is unrecoverable. IPv4 is not affected because dsts
    don't take a refcount on the route.
    
    [1]
     $ ip nexthop list
      id 200 via 2002:db8::2 dev bridge.10 scope link onlink
      id 201 via 2002:db8::3 dev bridge scope link onlink
      id 203 group 201/200
     $ ip -6 route
      2001:db8::10 nhid 203 metric 1024 pref medium
         nexthop via 2002:db8::3 dev bridge weight 1 onlink
         nexthop via 2002:db8::2 dev bridge.10 weight 1 onlink
    
    Create rt6_info through one of the multipath legs, e.g.:
     $ taskset -a -c 1  ./pkt_inj 24 bridge.10 2001:db8::10
     (pkt_inj is just a custom packet generator, nothing special)
    
    Then remove that leg from the group by replace (let's assume it is id
    200 in this case):
     $ ip nexthop replace id 203 group 201
    
    Now remove the IPv6 route:
     $ ip -6 route del 2001:db8::10/128
    
    The route won't be really deleted due to the stale rt6_info holding 1
    refcnt in nexthop id 200.
    At this point we have the following reference count dependency:
     (deleted) IPv6 route holds 1 reference over nhid 203
     nh 203 holds 1 ref over id 201
     nh 200 holds 1 ref over the net device and the route due to the stale
     rt6_info
    
    Now to create circular dependency between nh 200 and the IPv6 route, and
    also to get a reference over nh 200, restore nhid 200 in the group:
     $ ip nexthop replace id 203 group 201/200
    
    And now we have a permanent circular dependncy because nhid 203 holds a
    reference over nh 200 and 201, but the route holds a ref over nh 203 and
    is deleted.
    
    To trigger the bug just delete the group (nhid 203):
     $ ip nexthop del id 203
    
    It won't really be deleted due to the IPv6 route dependency, and now we
    have 2 unlinked and deleted objects that reference each other: the group
    and the IPv6 route. Since the group drops the reference it holds over its
    entries at free time (i.e. its own refcount needs to drop to 0) that will
    never happen and we get a permanent ref on them, since one of the entries
    holds a reference over the IPv6 route it will also never be released.
    
    At this point the dependencies are:
     (deleted, only unlinked) IPv6 route holds reference over group nh 203
     (deleted, only unlinked) group nh 203 holds reference over nh 201 and 200
     nh 200 holds 1 ref over the net device and the route due to the stale
     rt6_info
    
    This is the last point where it can be fixed by running traffic through
    nh 200, and specifically through the same CPU so the rt6_info (dst) will
    get released due to the IPv6 genid, that in turn will free the IPv6
    route, which in turn will free the ref count over the group nh 203.
    
    If nh 200 is deleted at this point, it will never be released due to the
    ref from the unlinked group 203, it will only be unlinked:
     $ ip nexthop del id 200
     $ ip nexthop
     $
    
    Now we can never release that stale rt6_info, we have IPv6 route with ref
    over group nh 203, group nh 203 with ref over nh 200 and 201, nh 200 with
    rt6_info (dst) with ref over the net device and the IPv6 route. All of
    these objects are only unlinked, and cannot be released, thus they can't
    release their ref counts.
    
     Message from syslogd@dev at Nov 19 14:04:10 ...
      kernel:[73501.828730] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3
     Message from syslogd@dev at Nov 19 14:04:20 ...
      kernel:[73512.068811] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3
    
    Fixes: 7bf4796dd099 ("nexthops: add support for replace")
    Signed-off-by: Nikolay Aleksandrov <nikolay@xxxxxxxxxx>
    Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 8bdffdf0502a1..4d69b3de980a6 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -839,15 +839,36 @@ static void remove_nexthop(struct net *net, struct nexthop *nh,
 /* if any FIB entries reference this nexthop, any dst entries
  * need to be regenerated
  */
-static void nh_rt_cache_flush(struct net *net, struct nexthop *nh)
+static void nh_rt_cache_flush(struct net *net, struct nexthop *nh,
+			      struct nexthop *replaced_nh)
 {
 	struct fib6_info *f6i;
+	struct nh_group *nhg;
+	int i;
 
 	if (!list_empty(&nh->fi_list))
 		rt_cache_flush(net);
 
 	list_for_each_entry(f6i, &nh->f6i_list, nh_list)
 		ipv6_stub->fib6_update_sernum(net, f6i);
+
+	/* if an IPv6 group was replaced, we have to release all old
+	 * dsts to make sure all refcounts are released
+	 */
+	if (!replaced_nh->is_group)
+		return;
+
+	/* new dsts must use only the new nexthop group */
+	synchronize_net();
+
+	nhg = rtnl_dereference(replaced_nh->nh_grp);
+	for (i = 0; i < nhg->num_nh; i++) {
+		struct nh_grp_entry *nhge = &nhg->nh_entries[i];
+		struct nh_info *nhi = rtnl_dereference(nhge->nh->nh_info);
+
+		if (nhi->family == AF_INET6)
+			ipv6_stub->fib6_nh_release_dsts(&nhi->fib6_nh);
+	}
 }
 
 static int replace_nexthop_grp(struct net *net, struct nexthop *old,
@@ -994,7 +1015,7 @@ static int replace_nexthop(struct net *net, struct nexthop *old,
 		err = replace_nexthop_single(net, old, new, extack);
 
 	if (!err) {
-		nh_rt_cache_flush(net, old);
+		nh_rt_cache_flush(net, old, new);
 
 		__remove_nexthop(net, new, NULL);
 		nexthop_put(new);



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux