On 11/11/2015 08:49 AM, Shuah Khan wrote: > On 11/11/2015 08:36 AM, Mauro Carvalho Chehab wrote: >> Em Wed, 11 Nov 2015 07:22:47 -0700 >> Shuah Khan <shuahkh@xxxxxxxxxxxxxxx> escreveu: >> >>> On 11/11/2015 05:30 AM, Mauro Carvalho Chehab wrote: >>>> Em Mon, 09 Nov 2015 08:55:06 -0700 >>>> Shuah Khan <shuahkh@xxxxxxxxxxxxxxx> escreveu: >>>> >>>>> On 11/09/2015 08:51 AM, Shuah Khan wrote: >>>>>> As I mentioned on the IRC, here is the log for the problems I am seeing. >>>>>> I have to do eject HVR 950Q TV stick to see the problem. >>>>>> >>>>>> mc_next_gen.v8.4 branch with no changes. >>>>>> >>>>>> I can test and debug this week. >>>>>> >>>>>> thanks, >>>>>> -- Shuah >>>>>> >>>>> >>>>> Forgot to cc linux-media, just in case others are interested >>>>> and have ideas on debugging. >>>>> snip >>>> >>>> Sorry, but I fail to see how this is related to the V4L2 subsystem. >>>> >>>> At least on my eyes, it seems that the bug is somewhere at the Radeon >>>> driver. >>>> >>> >>> Mauro, >>> >>> I think you didn't look down the dmesg far enough. The following is the >>> problem I am talking about and you will see media_device_unregister() >>> on the stack. This occurs as soon as the device is removed. >> >> Shuah, >> >> I saw that, but it is clear, from the above log, that the Radeon >> driver is broken and it has some bad lock dependencies with the >> driver_attach locks. Any other bad lock report related to the >> Radeon driver or driver binding/unbiding code are very likely >> related to the above bug. >> >> You should either fix the bad lock at the Radeon driver or not >> load it at all, in order to be able to get any reliable results >> about possible locking troubles with the MC drivers with the Kernel >> lock tests. >> > > Yeah Radeon driver bug could be making things worse. Did you see > any problems with device removal during your testing? > > ok found the following commit that fixes the problem: > 7231ed1a813e0a9d249bbbe58e66ca058aee83e1 > > This went into 4.2-rc4 or rc5. I will test applying just this > one patch to mc_next_gen.v8.4 branch and see if device removal > problem also goes away. > Applied the acpi backlight fix and now kernel hangs solid when device is removed. I managed to get stack trace enabling sysrq and that showed media_device_unregister_entity() attempt to hold spin_lock() -> raw_spin_lock() on the stack trace. It is same as the one seen in the dmesg I sent. I think we have several calls to media_device_unregister_entity() from various media core drivers (dvb, v4l2, bridge driver) during device removal from their unregister paths. This adds lot of contention on the mdev->lock. media_device_unregister() calls media_device_unregister_entity() as well on all the mdev entities. I am not testing with my ALSA patches at the moment. When that gets added, media_devnode_is_registered() check to ensure only one of them (bridge driver or snd-usb-audio) runs media_device_unregister() won't work as the MEDIA_FLAG_REGISTERED flag gets cleared towards the end of media_device_unregister(). media_device_unregister() needs to be safe to be run by these two drivers and still do the work only once. media_device_unregister() does a lot (removes interface links, interfaces, and then then unregister entities before it removes the media device devnode file and call media_devnode_unregister() to clear the MEDIA_FLAG_REGISTERED bit. I see two problems to solve: - ensure media_device_unregister() is safe to be called by one or more drivers during device removal (usb disconnect in this case) - Reduce contention on mdev->lock during device removal I have some ideas on how to do this. I can work on them and send patches. Sounds like a plan? thanks, -- Shuah -- Shuah Khan Sr. Linux Kernel Developer Open Source Innovation Group Samsung Research America (Silicon Valley) shuahkh@xxxxxxxxxxxxxxx | (970) 217-8978 -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html