On Tue, Feb 20, 2018 at 09:14:41AM +0100, Dmitry Vyukov wrote: > On Tue, Feb 20, 2018 at 8:56 AM, Tommi Rantala > <tommi.t.rantala@xxxxxxxxx> wrote: > > On 19.02.2018 20:59, Dmitry Vyukov wrote: > >> > >> On Sat, Feb 3, 2018 at 1:15 PM, Xin Long <lucien.xin@xxxxxxxxx> wrote: > >>>>> > >>>>> On 1/30/18 1:57 PM, David Ahern wrote: > >>>>>> > >>>>>> On 1/30/18 1:08 PM, Daniel Borkmann wrote: > >>>>>>> > >>>>>>> On 01/30/2018 07:32 PM, Cong Wang wrote: > >>>>>>>> > >>>>>>>> On Tue, Jan 30, 2018 at 4:09 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Hello, > >>>>>>>>> > >>>>>>>>> The following program creates a hang in unregister_netdevice. > >>>>>>>>> cleanup_net work hangs there forever periodically printing > >>>>>>>>> "unregister_netdevice: waiting for lo to become free. Usage count = > >>>>>>>>> 3" > >>>>>>>>> and creation of any new network namespaces hangs forever. > >>>>>>>> > >>>>>>>> > >>>>>>>> Interestingly, this is not reproducible on net-next. > >>>>>>> > >>>>>>> > >>>>>>> The most recent change on netns refcnt was 4ee806d51176 ("net: tcp: > >>>>>>> close > >>>>>>> sock if net namespace is exiting") in net/net-next from 5 days ago, > >>>>>>> maybe > >>>>>>> fixed due to that? > >>>>>>> > >>>>>> > >>>>>> This appears to be the commit introducing the refcnt leak: > >>>>>> > >>>>>> $ git bisect bad > >>>>>> dbc2b5e9a09e9a6664679a667ff81cff6e5f2641 is the first bad commit > >>>>>> commit dbc2b5e9a09e9a6664679a667ff81cff6e5f2641 > >>>>>> Author: Xin Long <lucien.xin@xxxxxxxxx> > >>>>>> Date: Fri May 12 14:39:52 2017 +0800 > >>>>>> > >>>>>> sctp: fix src address selection if using secondary addresses for > >>>>>> ipv6 > >>>>>> > >>>>>> > >>>>>> v4.14 is bad. Running bisect in the background while doing other > >>>>>> things.... > >>>>>> > >>>>> > >>>>> Interesting. The commit that avoids the refcnt leak is > >>>>> > >>>>> commit 955ec4cb3b54c7c389a9f830be7d3ae2056b9212 > >>>>> Author: David Ahern <dsahern@xxxxxxxxx> > >>>>> Date: Wed Jan 24 19:45:29 2018 -0800 > >>>>> > >>>>> net/ipv6: Do not allow route add with a device that is down > >>>>> > >>>>> That commit does not intentionally address the problem so it is just > >>>>> masking the problematic code introduced by the commit above. > >>>> > >>>> Thanks, David A. > >>>> > >>>> I'm still on a trip. will look into this asap. > >>> > >>> > >>> Alexey and Tommi already had the patches for this issue on > >>> both SCTP v4 and v6 dst_get, Thanks. > >> > >> > >> > >> > >> Is this meant to be fixed already? I am still seeing this on the > >> latest upstream tree. > >> > > > > These two commits are in v4.16-rc1: > > > > commit 4a31a6b19f9ddf498c81f5c9b089742b7472a6f8 > > Author: Tommi Rantala <tommi.t.rantala@xxxxxxxxx> > > Date: Mon Feb 5 21:48:14 2018 +0200 > > > > sctp: fix dst refcnt leak in sctp_v4_get_dst > > ... > > Fixes: 410f03831 ("sctp: add routing output fallback") > > Fixes: 0ca50d12f ("sctp: fix src address selection if using secondary > > addresses") > > > > > > commit 957d761cf91cdbb175ad7d8f5472336a4d54dbf2 > > Author: Alexey Kodanev <alexey.kodanev@xxxxxxxxxx> > > Date: Mon Feb 5 15:10:35 2018 +0300 > > > > sctp: fix dst refcnt leak in sctp_v6_get_dst() > > ... > > Fixes: dbc2b5e9a09e ("sctp: fix src address selection if using secondary > > addresses for ipv6") > > > > > > I guess we missed something if it's still reproducible. > > > > I can check it later this week, unless someone else beat me to it. > > Hi Tommi, > > Hmmm, I can't claim that it's exactly the same bug. Perhaps it's > another one then. But I am still seeing these: > > [ 58.799130] unregister_netdevice: waiting for lo to become free. > Usage count = 4 > [ 60.847138] unregister_netdevice: waiting for lo to become free. > Usage count = 4 > [ 62.895093] unregister_netdevice: waiting for lo to become free. > Usage count = 4 > [ 64.943103] unregister_netdevice: waiting for lo to become free. > Usage count = 4 > > on upstream tree pulled ~12 hours ago. > Can you write a systemtap script to probe dev_hold, and dev_put, printing out a backtrace if the device name matches "lo". That should tell us definitively if the problem is in the same location or not Neil > Kernel does not detect this as any kind of BUG/WARNING, so > syzkaller/syzbot do not catch it as bug and do not try to reproduce, > localize and report. > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html