On Tue, 2009-03-24 at 11:23 -0600, Alex Chiang wrote: > > There is no bug -- it's a false positive in a way. I've pointed this out > > in the original thread, see > > http://thread.gmane.org/gmane.linux.kernel/550877/focus=550932 > > I'm actually a bit confused now. Sorry. > Peter explained why flushing a workqueue from the same queue is > bad, and in general I agree, but what do you mean by "false > positive"? Well, even though generally flushing it from within is bad, the actual thing lockdep reports is bogus -- it's reporting a nested locking. > By the way, this scenario: > > code path 1: > my_function() -> lock(L1); ...; flush_workqueue(); ... > > code path 2: > run_workqueue() -> my_work() -> ...; lock(L1); ... > > is _not_ what is happening here. Indeed. > So what you really have going on is: > > sysfs callback -> add remove callback to global workqueue > remove callback fires off (pci_remove_bus_device) and we do... > device_unregister > driver's ->remove method called > driver's ->remove method calls flush_scheduled_work > > Yes, after read the thread I agree that generically calling > flush_workqueue in the middle of run_workqueue is bad, but the > lockdep warning that Kenji showed us really won't deadlock. Exactly that is what I meant by "false positive". > This is because pci_remove_bus_device() will not acquire any lock > L1 that an individual device driver will attempt to acquire in > the remove path. If that were the case, we would deadlock every > time you rmmod'ed a device driver's module or every time you shut > your machine down. > > I think from my end, there are 2 things I need to do: > > a) make sysfs_schedule_callback() use its own work queue > instead of global work queue, because too many drivers > call flush_scheduled_work in their remove path > > b) give sysfs attributes the ability to commit suicide > > (a) is short term work, 2.6.30 timeframe, since it doesn't > involve any large conceptual changes. > > (b) is picking up Tejun Heo's existing work, but that was a bit > controversial last time, and I'm not sure it will make it during > this merge window. > > Question for the lockdep folks though -- given what I described, > do you agree that the warning we saw was a false positive? Or am > I off in left field? I think we're not sure yet -- it seems Lai Jiangshan described a scenario in which flushing from within the work actually _can_ deadlock. johannes
Attachment:
signature.asc
Description: This is a digitally signed message part