On Saturday, June 15, 2013 10:17:42 PM Rafael J. Wysocki wrote: > On Saturday, June 15, 2013 09:44:28 AM Jiang Liu wrote: > > On Sat 15 Jun 2013 06:21:02 AM CST, Rafael J. Wysocki wrote: > > > On Saturday, June 15, 2013 03:27:59 AM Jiang Liu wrote: > > >> This is a preparation for next patch to avoid breaking bisecting. > > >> If next patch is applied without this one, it will cause deadlock > > >> as below: > > >> > > >> Case 1: > > >> [ 31.015593] Possible unsafe locking scenario: > > >> > > >> [ 31.018350] CPU0 CPU1 > > >> [ 31.019691] ---- ---- > > >> [ 31.021002] lock(&dock_station->hp_lock); > > >> [ 31.022327] lock(&slot->crit_sect); > > >> [ 31.023650] lock(&dock_station->hp_lock); > > >> [ 31.025010] lock(&slot->crit_sect); > > >> [ 31.026342] > > >> > > >> Case 2: > > >> hotplug_dock_devices() > > >> mutex_lock(&ds->hp_lock) > > >> dd->ops->handler() > > >> register_hotplug_dock_device() > > >> mutex_lock(&ds->hp_lock) > > >> [ 34.316570] [ INFO: possible recursive locking detected ] > > >> [ 34.316573] 3.10.0-rc4 #6 Tainted: G C > > >> [ 34.316575] --------------------------------------------- > > >> [ 34.316577] kworker/0:0/4 is trying to acquire lock: > > >> [ 34.316579] (&dock_station->hp_lock){+.+.+.}, at: > > >> [<ffffffff813c766b>] register_hotplug_dock_device+0x6a/0xbf > > >> [ 34.316588] > > >> but task is already holding lock: > > >> [ 34.316590] (&dock_station->hp_lock){+.+.+.}, at: > > >> [<ffffffff813c7270>] hotplug_dock_devices+0x2c/0xda > > >> [ 34.316595] > > >> other info that might help us debug this: > > >> [ 34.316597] Possible unsafe locking scenario: > > >> > > >> [ 34.316599] CPU0 > > >> [ 34.316601] ---- > > >> [ 34.316602] lock(&dock_station->hp_lock); > > >> [ 34.316605] lock(&dock_station->hp_lock); > > >> [ 34.316608] > > >> *** DEADLOCK *** > > >> > > >> So fix this deadlock by not taking ds->hp_lock in function > > >> register_hotplug_dock_device(). This patch also fixes a possible > > >> race conditions in function dock_event() because previously it > > >> accesses ds->hotplug_devices list without holding ds->hp_lock. > > >> > > >> Signed-off-by: Jiang Liu <jiang.liu@xxxxxxxxxx> > > >> Cc: Len Brown <lenb@xxxxxxxxxx> > > >> Cc: "Rafael J. Wysocki" <rjw@xxxxxxx> > > >> Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > > >> Cc: Yinghai Lu <yinghai@xxxxxxxxxx> > > >> Cc: Yijing Wang <wangyijing@xxxxxxxxxx> > > >> Cc: linux-acpi@xxxxxxxxxxxxxxx > > >> Cc: linux-kernel@xxxxxxxxxxxxxxx > > >> Cc: linux-pci@xxxxxxxxxxxxxxx > > >> Cc: <stable@xxxxxxxxxxxxxxx> # 3.8+ > > >> --- > > >> drivers/acpi/dock.c | 109 ++++++++++++++++++++++--------------- > > >> drivers/pci/hotplug/acpiphp_glue.c | 15 +++++ > > >> include/acpi/acpi_drivers.h | 2 + > > >> 3 files changed, 82 insertions(+), 44 deletions(-) > > >> > > >> diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c > > >> index 02b0563..602bce5 100644 > > >> --- a/drivers/acpi/dock.c > > >> +++ b/drivers/acpi/dock.c > > >> @@ -66,7 +66,7 @@ struct dock_station { > > >> spinlock_t dd_lock; > > >> struct mutex hp_lock; > > >> struct list_head dependent_devices; > > >> - struct list_head hotplug_devices; > > >> + struct klist hotplug_devices; > > >> > > >> struct list_head sibling; > > >> struct platform_device *dock_device; > > >> @@ -76,12 +76,18 @@ static int dock_station_count; > > >> > > >> struct dock_dependent_device { > > >> struct list_head list; > > >> - struct list_head hotplug_list; > > >> + acpi_handle handle; > > >> +}; > > >> + > > >> +struct dock_hotplug_info { > > >> + struct klist_node node; > > >> acpi_handle handle; > > >> const struct acpi_dock_ops *ops; > > >> void *context; > > >> }; > > > > > > Can we please relax a bit and possibly take a step back? > > > > > > So since your last reply to me wasn't particularly helpful, I went through the > > > code in dock.c and acpiphp_glue.c and I simply think that the whole > > > hotplug_list thing is simply redundant. > > > > > > It looks like instead of using it (or the klist in this patch), we can add a > > > "hotlpug_device" flag to dock_dependent_device and set that flag instead of > > > adding dd to hotplug_devices or clear it instead of removing dd from that list. > > > > > > That would allow us to avoid the deadlock, because we wouldn't need the hp_lock > > > any more and perhaps we could make the code simpler instead of making it more > > > complex. > > > > > > How does that sound? > > > > > > Rafael > > Hi Rafael, > > Thanks for comments! It would be great if we could kill the > > hotplug_devices > > list so thing gets simple. But there are still some special cases:( > > > > As you have mentioned, ds->hp_lock is used to make both addition and > > removal > > of hotplug devices wait for us to complete walking ds->hotplug_devices. > > So it acts as two roles: > > 1) protect the hotplug_devices list, > > 2) serialize unregister_hotplug_dock_device() and > > hotplug_dock_devices() so > > the dock driver doesn't access registered handler and associated data > > structure > > once returing from unregister_hotplug_dock_device(). > > When it returns from unregister_hotplug_dock_device(), nothing prevents it > from accessing whatever it wants, because ds->hp_lock is not used outside > of the add/del and hotplug_dock_devices(). So, the actual role of > ds->hp_lock (not the one that it is supposed to play, but the real one) > is to prevent addition/deletion from happening when hotplug_dock_devices() > is running. [Yes, it does protect the list, but since the list is in fact > unnecessary, that doesn't matter.] > > > If we simply use a flag to mark presence of registered callback, we > > can't achieve the second goal. > > I don't mean using the flag *alone*. > > > Take the sony laptop as an example. It has several PCI > > hotplug > > slot associated with the dock station: > > [ 28.829316] acpiphp_glue: _handle_hotplug_event_func: Bus check > > notify on \_SB_.PCI0.RP07.LPMB > > [ 30.174964] acpiphp_glue: _handle_hotplug_event_func: Bus check > > notify on \_SB_.PCI0.RP07.LPMB.LPM0 > > [ 30.174973] acpiphp_glue: _handle_hotplug_event_func: Bus check > > notify on \_SB_.PCI0.RP07.LPMB.LPM1 > > [ 30.174979] acpiphp_glue: _handle_hotplug_event_func: Bus check > > notify on \_SB_.PCI0.RP07.LPMB.LPM2 > > [ 30.174985] acpiphp_glue: _handle_hotplug_event_func: Bus check > > notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GFXA > > [ 30.175020] acpiphp_glue: _handle_hotplug_event_func: Bus check > > notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR0.GHDA > > [ 30.175040] acpiphp_glue: _handle_hotplug_event_func: Bus check > > notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC0.DLAN > > [ 30.175050] acpiphp_glue: _handle_hotplug_event_func: Bus check > > notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC1.DODD > > [ 30.175060] acpiphp_glue: _handle_hotplug_event_func: Bus check > > notify on \_SB_.PCI0.RP07.LPMB.LPM2.LPRI.LPR1.LPCI.LPC2.DUSB > > > > So it still has some race windows if we undock the station while > > repeatedly rescanning/removing > > the PCI bus for \_SB_.PCI0.RP07.LPMB.LPM0 through sysfs interfaces. Which sysfs interfaces do you mean, by the way? If you mean "eject", then it takes acpi_scan_lock and hotplug_dock_devices() should always be run under acpi_scan_lock too. It isn't at the moment, because write_undock() doesn't take acpi_scan_lock(), but this is an obvious bug (so I'm going to send a patch to fix it in a while). With that bug fixed, the possible race between acpi_eject_store() and hotplug_dock_devices() should be prevented from happening, so perhaps we're worrying about something that cannot happen? Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html