Hi Willy, Thanks for your prompt reply. I will check the additional fixes you mention and get back to you shortly. I have just submitted the 4 patches necessary and tested with to resolve the crashes we are getting in fib6_clean_all() and in fib6_del() in the 3.10 kernel: 2c861cc65ef4 ("ipv6: don't call fib6_run_gc() until routing is ready") b7b1bfce0bb6 ("ipv6: split duplicate address detection and router solicitation timer") c15b1ccadb32 ("ipv6: move DAD and addrconf_verify processing to workqueue") a9ed4a2986e1 ("ipv6: fix rtnl locking in setsockopt for anycast and multicast") The 1st one (from v3.11) is a self-contained clean patch to resolve a crash in fib6_clean_all(). The 2nd, 3rd and 4th patches need to be applied in the order above - apologies,for some reason, I had to resend the 1st and 2nd patch. They resolve a painful kernel bug in net/ipv6/ip6_fib.c:fib6_purge_rt() that we are getting due to an invalid ref count for rt6_info for some usecases, the underlying reason being problems with locking, which is resolved in the 3rd patch (from v3.14). The 2nd patch (from v3.11) is a prerequisite to avoid a rework of this, and the 4th patch (from v3.14) is for completeness so as to bring the code for rtnl locking here in line with 3.14 code where this issue is not observed. Stack trace for the refcnt issue: [ 236.941008] kernel BUG at net/ipv6/ip6_fib.c:660! [ 236.950191] [<ffffffffa01c7190>] ? fib6_del+0x270/0x340 [ipv6] [ 236.950191] [<ffffffffa01c7260>] ? fib6_del+0x340/0x340 [ipv6] [ 236.950191] [<ffffffffa01c50a0>] ? ip6_route_cleanup+0x60/0x60 [ipv6] [ 236.950191] [<ffffffffa01c72be>] ? fib6_clean_node+0x5e/0xd0 [ipv6] [ 236.950191] [<ffffffffa01c52a6>] ? fib6_walk_continue+0x186/0x1c0 [ipv6] [ 236.950191] [<ffffffffa01c5331>] ? fib6_walk+0x51/0xb0 [ipv6] [ 236.950191] [<ffffffff81482079>] ? _raw_write_lock_bh+0x9/0x20 [ 236.950191] [<ffffffffa01c747c>] ? fib6_clean_all+0x8c/0xc0 [ipv6] [ 236.950191] [<ffffffffa01c7260>] ? fib6_del+0x340/0x340 [ipv6] [ 236.950191] [<ffffffffa01bfb60>] ? fib6_remove_prefsrc+0x50/0x50 [ipv6] [ 236.950191] [<ffffffffa01c4c97>] ? rt6_ifdown+0x27/0xc0 [ipv6] [ 236.950191] [<ffffffffa01bd2e8>] ? addrconf_ifdown+0x38/0x410 [ipv6] Thanks Mike Manning On 12/16/2016 10:40 AM, Willy Tarreau wrote: > Hi Mike, > > On Fri, Dec 16, 2016 at 10:16:12AM +0000, Mike Manning wrote: >> From: Hannes Frederic Sowa <hannes@xxxxxxxxxxxxxxxxxxx> >> >> commit c15b1ccadb323ea50023e8f1cca2954129a62b51 upstream. >> >> addrconf_join_solict and addrconf_join_anycast may cause actions which >> need rtnl locked, especially on first address creation. > (...) > > Thanks, I'm fine with merging these patches, but a quick check tells me > that at least the first one caused some issues that were later fixed, > for example : > > From 43a43b6040165f7b40b5b489fe61a4cb7f8c4980 Mon Sep 17 00:00:00 2001 > From: Hannes Frederic Sowa <hannes@xxxxxxxxxxxxxxxxxxx> > Date: Mon, 31 Mar 2014 20:14:10 +0200 > Subject: [PATCH] ipv6: some ipv6 statistic counters failed to disable bh > > After commit c15b1ccadb323ea ("ipv6: move DAD and addrconf_verify > processing to workqueue") some counters are now updated in process context > and thus need to disable bh before doing so, otherwise deadlocks can > happen on 32-bit archs. Fabio Estevam noticed this while while mounting > a NFS volume on an ARM board. > > Can you please have a quick check to ensure that all necessary fixes > that come with these two patches are also identified ? I'll then queue > them all at once. > > Thanks! > Willy > -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html