Re: [PATCH 3.10] ipv6: move DAD and addrconf_verify processing to workqueue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Willy,
For the 4 fixes below, as you indicate there are some follow-up bugfixes:

43a43b604016 ("ipv6: some ipv6 statistic counters failed to disable bh") from v3.15
751eb6b6042a ("ipv6: addrconf: fix dev refcont leak when DAD failed") from v4.8

I will look at backporting these too (TBH we have never hit the 2nd of these with 4.x (x<8) kernels despite extensive testing in the area of DAD failure). As this also involves retesting there will be some delays with this, but later today.

Thanks
Mike


On 12/16/2016 10:51 AM, Mike Manning wrote:
> Hi Willy,
> Thanks for your prompt reply.
> 
> I will check the additional fixes you mention and get back to you shortly.
> 
> I have just submitted the 4 patches necessary and tested with to resolve the crashes we are getting in fib6_clean_all() and in fib6_del() in the 3.10 kernel:
> 
> 2c861cc65ef4 ("ipv6: don't call fib6_run_gc() until routing is ready")
> b7b1bfce0bb6 ("ipv6: split duplicate address detection and router solicitation timer")
> c15b1ccadb32 ("ipv6: move DAD and addrconf_verify processing to workqueue")
> a9ed4a2986e1 ("ipv6: fix rtnl locking in setsockopt for anycast and multicast")
> 
> The 1st one (from v3.11) is a self-contained clean patch to resolve a crash in fib6_clean_all().
> 
> The 2nd, 3rd and 4th patches need to be applied in the order above - apologies,for some reason, I had to resend the 1st and 2nd patch. They resolve a painful kernel bug in net/ipv6/ip6_fib.c:fib6_purge_rt() that we are getting due to an invalid ref count for rt6_info for some usecases, the underlying reason being problems with locking, which is resolved in the 3rd patch (from v3.14).
> 
> The 2nd patch (from v3.11) is a prerequisite to avoid a rework of this, and the 4th patch (from v3.14) is for completeness so as to bring the code for rtnl locking here in line with 3.14 code where this issue is not observed.
> 
> Stack trace for the refcnt issue:
> 
> [  236.941008] kernel BUG at net/ipv6/ip6_fib.c:660!
> 
> [  236.950191]  [<ffffffffa01c7190>] ? fib6_del+0x270/0x340 [ipv6]
> [  236.950191]  [<ffffffffa01c7260>] ? fib6_del+0x340/0x340 [ipv6]
> [  236.950191]  [<ffffffffa01c50a0>] ? ip6_route_cleanup+0x60/0x60 [ipv6]
> [  236.950191]  [<ffffffffa01c72be>] ? fib6_clean_node+0x5e/0xd0 [ipv6]
> [  236.950191]  [<ffffffffa01c52a6>] ? fib6_walk_continue+0x186/0x1c0 [ipv6]
> [  236.950191]  [<ffffffffa01c5331>] ? fib6_walk+0x51/0xb0 [ipv6]
> [  236.950191]  [<ffffffff81482079>] ? _raw_write_lock_bh+0x9/0x20
> [  236.950191]  [<ffffffffa01c747c>] ? fib6_clean_all+0x8c/0xc0 [ipv6]
> [  236.950191]  [<ffffffffa01c7260>] ? fib6_del+0x340/0x340 [ipv6]
> [  236.950191]  [<ffffffffa01bfb60>] ? fib6_remove_prefsrc+0x50/0x50 [ipv6]
> [  236.950191]  [<ffffffffa01c4c97>] ? rt6_ifdown+0x27/0xc0 [ipv6]
> [  236.950191]  [<ffffffffa01bd2e8>] ? addrconf_ifdown+0x38/0x410 [ipv6]
> 
> Thanks
> Mike Manning
> 
> On 12/16/2016 10:40 AM, Willy Tarreau wrote:
>> Hi Mike,
>>
>> On Fri, Dec 16, 2016 at 10:16:12AM +0000, Mike Manning wrote:
>>> From: Hannes Frederic Sowa <hannes@xxxxxxxxxxxxxxxxxxx>
>>>
>>> commit c15b1ccadb323ea50023e8f1cca2954129a62b51 upstream.
>>>
>>> addrconf_join_solict and addrconf_join_anycast may cause actions which
>>> need rtnl locked, especially on first address creation.
>> (...)
>>
>> Thanks, I'm fine with merging these patches, but a quick check tells me
>> that at least the first one caused some issues that were later fixed,
>> for example :
>>
>>   From 43a43b6040165f7b40b5b489fe61a4cb7f8c4980 Mon Sep 17 00:00:00 2001
>>   From: Hannes Frederic Sowa <hannes@xxxxxxxxxxxxxxxxxxx>
>>   Date: Mon, 31 Mar 2014 20:14:10 +0200
>>   Subject: [PATCH] ipv6: some ipv6 statistic counters failed to disable bh
>>   
>>   After commit c15b1ccadb323ea ("ipv6: move DAD and addrconf_verify
>>   processing to workqueue") some counters are now updated in process context
>>   and thus need to disable bh before doing so, otherwise deadlocks can
>>   happen on 32-bit archs. Fabio Estevam noticed this while while mounting
>>   a NFS volume on an ARM board.
>>
>> Can you please have a quick check to ensure that all necessary fixes
>> that come with these two patches are also identified ? I'll then queue
>> them all at once.
>>
>> Thanks!
>> Willy
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DgIC-g&c=IL_XqQWOjubgfqINi2jTzg&r=yoZmIpJoVrxz0kaacNf7SLsjlEFJjgdNySBgxxTmSiI&m=7pTqflK6e0NyHY8tS3rwZ_DxYFfqGxilUcQE2kEd-OU&s=rSBOLXuHUVI0HIn53WPCX6Lfwp-JO-XoxZ74YZhO5U4&e= 
> 

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]