On Thu, Apr 12, 2018 at 8:15 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: > On Wed, Feb 21, 2018 at 3:53 PM, Tommi Rantala > <tommi.t.rantala@xxxxxxxxx> wrote: >> On 20.02.2018 18:26, Neil Horman wrote: >>> >>> On Tue, Feb 20, 2018 at 09:14:41AM +0100, Dmitry Vyukov wrote: >>>> >>>> On Tue, Feb 20, 2018 at 8:56 AM, Tommi Rantala >>>> <tommi.t.rantala@xxxxxxxxx> wrote: >>>>> >>>>> On 19.02.2018 20:59, Dmitry Vyukov wrote: >>>>>> >>>>>> Is this meant to be fixed already? I am still seeing this on the >>>>>> latest upstream tree. >>>>>> >>>>> >>>>> These two commits are in v4.16-rc1: >>>>> >>>>> commit 4a31a6b19f9ddf498c81f5c9b089742b7472a6f8 >>>>> Author: Tommi Rantala <tommi.t.rantala@xxxxxxxxx> >>>>> Date: Mon Feb 5 21:48:14 2018 +0200 >>>>> >>>>> sctp: fix dst refcnt leak in sctp_v4_get_dst >>>>> ... >>>>> Fixes: 410f03831 ("sctp: add routing output fallback") >>>>> Fixes: 0ca50d12f ("sctp: fix src address selection if using >>>>> secondary >>>>> addresses") >>>>> >>>>> >>>>> commit 957d761cf91cdbb175ad7d8f5472336a4d54dbf2 >>>>> Author: Alexey Kodanev <alexey.kodanev@xxxxxxxxxx> >>>>> Date: Mon Feb 5 15:10:35 2018 +0300 >>>>> >>>>> sctp: fix dst refcnt leak in sctp_v6_get_dst() >>>>> ... >>>>> Fixes: dbc2b5e9a09e ("sctp: fix src address selection if using >>>>> secondary >>>>> addresses for ipv6") >>>>> >>>>> >>>>> I guess we missed something if it's still reproducible. >>>>> >>>>> I can check it later this week, unless someone else beat me to it. >>>> >>>> >>>> Hi Tommi, >>>> >>>> Hmmm, I can't claim that it's exactly the same bug. Perhaps it's >>>> another one then. But I am still seeing these: >>>> >>>> [ 58.799130] unregister_netdevice: waiting for lo to become free. >>>> Usage count = 4 >>>> [ 60.847138] unregister_netdevice: waiting for lo to become free. >>>> Usage count = 4 >>>> [ 62.895093] unregister_netdevice: waiting for lo to become free. >>>> Usage count = 4 >>>> [ 64.943103] unregister_netdevice: waiting for lo to become free. >>>> Usage count = 4 >>>> >>>> on upstream tree pulled ~12 hours ago. >>>> >>> Can you write a systemtap script to probe dev_hold, and dev_put, printing >>> out a >>> backtrace if the device name matches "lo". That should tell us >>> definitively if >>> the problem is in the same location or not >> >> >> Hi Dmitry, I tested with the reproducer and the kernel .config file that you >> sent in the first email in this thread: >> >> With 4.16-rc2 unable to reproduce. >> >> With 4.15-rc9 bug reproducible, and I get "unregister_netdevice: waiting for >> lo to become free. Usage count = 3" >> >> With 4.15-rc9 and Alexey's "sctp: fix dst refcnt leak in sctp_v6_get_dst()" >> cherry-picked on top, unable to reproduce. >> >> >> Is syzkaller doing something else now to trigger the bug...? >> Can you still trigger the bug with the same reproducer? > > Hi Neil, Tommi, > > Reviving this old thread about "unregister_netdevice: waiting for lo > to become free. Usage count = 3" hangs. > I still did not have time to deep dive into what happens there (too > many bugs coming from syzbot). But this still actively happens and I > suspect accounts to a significant portion of various hang reports, > which are quite unpleasant. > > One idea that could make it all simpler: > > Is this wait loop in netdev_wait_allrefs() supposed to wait for any > prolonged periods of time under any non-buggy conditions? E.g. more > than 1-2 minutes? > If it only supposed to wait briefly for things that already supposed > to be shutting down, and we add a WARNING there after some timeout, > then syzbot will report all info how/when it happens, hopefully > extracting reproducers, and all the nice things. > But this WARNING should not have any false positives under any > realistic conditions (e.g. waiting for arrival of remote packets with > large timeouts). > > Looking at some task hung reports, it seems that this code holds some > mutexes, takes workqueue thread and prevents any progress with > destruction of other devices (and net namespace creation/destruction), > so I guess it should not wait for any indefinite periods of time? I'm working on this currently: https://bugs.launchpad.net/ubuntu/zesty/+source/linux/+bug/1711407 I added a summary of what I've found to be the cause (or at least, one possible cause) of this: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/comments/72 I'm working on a patch to work around the main side-effect of this, which is hanging while holding the global net mutex. Hangs will still happen (e.g. if a dst leaks) but should not affect anything else, other than a leak of the dst and its net namespace. Fixing the dst leaks is important too, of course, but a dst leak (or other cause) shouldn't break the entire system. -- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html