On Wed, Aug 23, 2023 at 00:38:46PM +0300, Ziqi Zhao wrote: > On Tue, Aug 22, 2023 at 01:40:45PM +0300, Nikolay Aleksandrov wrote: > > Thank you for testing, but we really need to understand what is going on > > and why the device isn't getting deleted for so long. Currently I don't > > have the time to debug it properly (I'll be able to next week at the > > earliest). We can't apply the patch based only on tests without > > understanding the underlying issue. I'd look into what > > the reproducer is doing exactly and also check the system state while the > > deadlock has happened. Also you can list the currently held locks (if > > CONFIG_LOCKDEP is enabled) via magic sysrq + d for example. See which > > process is holding them, what are their priorities and so on. > > Try to build some theory of how a deadlock might happen and then go > > about proving it. Does the 8021q module have the same problem? It uses > > similar code to set its hook. > > Hi Nik, > > Thank you so much for the instructions! I was able to obtain a decoded > stacktrace showing the reproducer behavior in my QEMU VM running kernel > 6.5-rc4, in case that would give us more context for pinpointing the > problem. Here's a link to the output: > > https://pastecat.io/?p=IlKZlflN9j2Z2mspjKe7 > > Basically, after running the reproducer (line 1854) for about 180 > seconnds or so, the unregister_netdevice warning was shown (line 1856), > and then after another 50 seconds, the kernel detected that some tasks > have been stalled for more than 143 seconds (line 1866), so it panicked > on the blocked tasks (line 2116). Before the panic, we did get to see > all the locks held in the system (line 2068), and it did show that many > processes created by the reproducer were contending the br_ioctl_mutex. > I'm now starting to wonder whether this is really a deadlock, or simply > some tasks not being able to grab the lock because so many processes > are trying to acquire it. > > Let me know what you think about the situation shown in the above log, > and let's keep in touch for any future debugging. Thank you again for > guiding me through the problem! > > Best regards, > Ziqi Hello, I've also encountered this bug while fuzzing. Is there any going work on this bug? -- 2.42.1