Re: INFO: possible circular locking dependency at cleanup_workqueue_thread

"Rafael J. Wysocki" <rjw@xxxxxxx> · Sun, 24 May 2009 13:09:13 +0200



On Sunday 24 May 2009, Ming Lei wrote:> 于 Sun, 24 May 2009 01:20:29 +0200> "Rafael J. Wysocki" <rjw@xxxxxxx> 写道:> > > On Saturday 23 May 2009, Johannes Berg wrote:> > > On Sat, 2009-05-23 at 00:23 +0200, Rafael J. Wysocki wrote:> > > > > > > > I just arrived at the same conclusion, heh. I can't say I> > > > > understand these changes though, the part about calling the> > > > > platform differently may make sense, but calling why disable> > > > > non-boot CPUs at a different place?> > > > > > > > Because the ordering of platform callbacks and cpu[_up()|_down()]> > > > is also important, at least on resume.> > > > > > > > In principle we can call device_pm_unlock() right before calling> > > > disable_nonboot_cpus() and take the lock again right after calling> > > > enable_nonboot_cpus(), if that helps.> > > > > > Probably, unless the cpu_add_remove_lock wasn't a red herring after> > > all. I'd test, but I don't have much time today, will be travelling> > > tomorrow and be at UDS all week next week so I don't know when I'll> > > get to it -- could you provide a patch and also attach it to> > > http://bugzilla.kernel.org/show_bug.cgi?id=13245 please? Miles (the> > > reporter of that bug) has been very helpful in testing before.> > > > OK> > > > The patch is appended for reference (Alan, please have a look; I> > can't recall why exactly we have called device_pm_lock() from the> > core suspend/hibernation code instead of acquiring the lock locally> > in drivers/base/power/main.c) and I'll attach it to the bug entry too.> > > > Thanks,> > Rafael> > > > ---> > From: Rafael J. Wysocki <rjw@xxxxxxx>> > Subject: PM: Do not hold dpm_list_mtx while disabling/enabling> > nonboot CPUs> > > > We shouldn't hold dpm_list_mtx while executing> > [disable|enable]_nonboot_cpus(), because theoretically this may lead> > to a deadlock as shown by the following example (provided by Johannes> > Berg):> > > > CPU 3       CPU 2                     CPU 1> >                                       suspend/hibernate> >             something:> >             rtnl_lock()               device_pm_lock()> >                                        -> mutex_lock(&dpm_list_mtx)> > > >             mutex_lock(&dpm_list_mtx)> > > > linkwatch_work> >  -> rtnl_lock()> >                                       disable_nonboot_cpus()> >                                        -> flush CPU 3 workqueue> > > > Fortunately, device drivers are supposed to stop any activities that> > might lead to the registration of new device objects and/or to the> > removal of the existing ones way before disable_nonboot_cpus() is> > called, so it shouldn't be necessary to hold dpm_list_mtx over the> > entire late part of device suspend and early part of device resume.> > > > Thus, during the late suspend and the early resume of devices acquire> > dpm_list_mtx only when dpm_list is going to be traversed and release> > it right after that.> > > > Signed-off-by: Rafael J. Wysocki <rjw@xxxxxxx>> > ---> >  drivers/base/power/main.c |    4 ++++> >  kernel/kexec.c            |    2 --> >  kernel/power/disk.c       |   21 +++------------------> >  kernel/power/main.c       |    7 +------> >  4 files changed, 8 insertions(+), 26 deletions(-)> > > > I try to apply the patch against lastest next tree(2009-05-22), but> "patch -p1" is failured:> > > [lm@linux-lm linux-2.6]$ patch -p1 <  ../patch_rx/INFO_possible_circular_locking_dependency_at_cleanup_workqueue_thread.patch > patching file kernel/power/disk.c> Hunk #1 succeeded at 215 with fuzz 2.> Hunk #3 succeeded at 278 with fuzz 1.> Hunk #4 FAILED at 343.> Hunk #5 succeeded at 396 with fuzz 2 (offset -4 lines).> Hunk #6 FAILED at 454.> Hunk #7 succeeded at 485 with fuzz 2.> 2 out of 7 hunks FAILED -- saving rejects to file kernel/power/disk.c.rej> patching file kernel/power/main.c> Hunk #1 succeeded at 289 with fuzz 1 (offset 18 lines).> patching file drivers/base/power/main.c> Hunk #3 succeeded at 616 with fuzz 2.> Hunk #4 succeeded at 625 with fuzz 2.> patching file kernel/kexec.c> Hunk #1 succeeded at 1451 with fuzz 2.> Hunk #2 succeeded at 1488 with fuzz 2.
The patch applies to the mainline, since it'll be a 2.6.30 candidate if it'sconfirmed to fix the problem.
Thanks,Rafael_______________________________________________linux-pm mailing listlinux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx://lists.linux-foundation.org/mailman/listinfo/linux-pm