[+cc Yinghai] On Fri, Jan 25, 2013 at 3:02 AM, Gu Zheng <guz.fnst@xxxxxxxxxxxxxx> wrote: > Hi Bjorn, > Thanks for your review and comments! Please refer to inlined comments below. > > On 01/25/2013 07:12 AM, Bjorn Helgaas wrote: > >> On Thu, Dec 27, 2012 at 12:42 AM, Lin Feng <linfeng@xxxxxxxxxxxxxx> wrote: >>> There is a potential deadlock situation when we manipulate the pci-sysfs user >>> interfaces from different bus hierarchy simultaneously, described as following: >>> >>> path1: sysfs remove device: | path2: sysfs rescan device: >>> sysfs_schedule_callback_work() | sysfs_write_file() >>> remove_callback() | flush_write_buffer() >>> *1* mutex_lock(&pci_remove_rescan_mutex)|*2* sysfs_get_active(attr_sd) >>> ... | dev_attr_store() >>> device_remove_file() | dev_rescan_store() >>> ... |*4* mutex_lock(&pci_remove_rescan_mutex) >>> *3* sysfs_deactivate(sd) | ... >>> wait_for_completion() |*5* sysfs_put_active(attr_sd) >>> *6* mutex_unlock(&pci_remove_rescan_mutex) > ...snip... >>> Reported-by: Taku Izumi <izumi.taku@xxxxxxxxxxxxxx> >>> Signed-off-by: Lin Feng <linfeng@xxxxxxxxxxxxxx> >>> Signed-off-by: Gu Zheng <guz.fnst@xxxxxxxxxxxxxx> >>> --- >>> drivers/pci/pci-sysfs.c | 42 ++++++++++++++++++++++++++---------------- >>> 1 files changed, 26 insertions(+), 16 deletions(-) >>> >>> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c >>> index 05b78b1..d2efbb0 100644 >>> --- a/drivers/pci/pci-sysfs.c >>> +++ b/drivers/pci/pci-sysfs.c >>> @@ -295,10 +295,13 @@ static ssize_t bus_rescan_store(struct bus_type *bus, > const char *buf, >>> return -EINVAL; >>> >>> if (val) { >>> - mutex_lock(&pci_remove_rescan_mutex); >>> - while ((b = pci_find_next_bus(b)) != NULL) >>> - pci_rescan_bus(b); >>> - mutex_unlock(&pci_remove_rescan_mutex); >>> + if (mutex_trylock(&pci_remove_rescan_mutex)) { >>> + while ((b = pci_find_next_bus(b)) != NULL) >>> + pci_rescan_bus(b); >>> + mutex_unlock(&pci_remove_rescan_mutex); >>> + } else { >>> + return 0; >> What are the semantics of returning 0 from a sysfs store function? >> Does the user's write just get dropped? I would think we'd return >> "count" for that case. > > Oh, yes, return "count" seems suitable here, although we did not reach the > user's target goal(rescan the bus), but the user's write has been flushed yet. > But the user still can not judge whether pci_rescan_bus() was successfully done > only by the return value. Shall we return a suitable error here to tell the user > that his write was written, but pci_rescan_bus() was not done ? > >> Is there some sort of automatic retry in libc >> or something if we return zero? > > No, there is not any extra operations in libc if we return zero indeed. > >> Are you relying on the user code to >> notice that nothing was written and do its own retry? >> > > Yes, but it seems impractical. > >> The last seems most likely, but that seems like it complicates the >> user's life unnecessarily. >> >>> + } >>> } >>> return count; >>> } >>> @@ -319,9 +322,12 @@ dev_rescan_store(struct device *dev, struct device_attribute *attr, >>> return -EINVAL; >>> >>> if (val) { >>> - mutex_lock(&pci_remove_rescan_mutex); >>> - pci_rescan_bus(pdev->bus); >>> - mutex_unlock(&pci_remove_rescan_mutex); >>> + if (mutex_trylock(&pci_remove_rescan_mutex)) { >>> + pci_rescan_bus(pdev->bus); >>> + mutex_unlock(&pci_remove_rescan_mutex); >>> + } else { >>> + return 0; >>> + } >>> } >>> return count; >>> } >>> @@ -330,9 +336,10 @@ static void remove_callback(struct device *dev) >>> { >>> struct pci_dev *pdev = to_pci_dev(dev); >>> >>> - mutex_lock(&pci_remove_rescan_mutex); >>> - pci_stop_and_remove_bus_device(pdev); >>> - mutex_unlock(&pci_remove_rescan_mutex); >>> + if (mutex_trylock(&pci_remove_rescan_mutex)) { >>> + pci_stop_and_remove_bus_device(pdev); >>> + mutex_unlock(&pci_remove_rescan_mutex); >>> + } >> In the other cases, I think the user will at least get some >> indication, e.g., a write() that returns zero, when we abort. But >> here, we silently skip the pci_stop_and_remove_bus_device(). That >> sounds wrong to me. What actually happens here, and why is it OK to >> skip it? > > Yeah, the hasty skip seems not suitable. We should give out some information > here, if we can not do pci_stop_and_remove_bus_device(). > >> Can we avoid the deadlock by queuing these in a workqueue instead of >> using the mutex_trylock() approach? >> > > No, I think use a workqueue to queue the rescan routine into workqueue as the > remove is not suitable. > After we queue the scan-bus work into workqueue, the rescan routine can > return directly(case1) or wait until work is completed(case2). > case1: > If we return directly after we queue the scan-bus work into workqueue. > Maybe this work will be scheduled some time later. If there is a > user's routine want to use a new-added device before the scan-bus work is > successfully done will fail. It can avoid the deadlock, but the rescan routine > is executed asynchronously. > case2: > If we wait until work is completed after we queue the scan-bus work into > workqueue. The s_active of the sys_dirent is still hold by us, so this approach > could not avoid the deadlock. I don't think the asynchronous nature of case 1 is a fatal problem. I'm not sure we can guarantee that a specific device is added and ready for use when the sysfs write completes, so I'm dubious about user space making assumptions about a device being ready. It should probably use a different mechanism, like udev, to learn about a new device. I'm sorry that you tripped over this deadlock, because now I'm worried about related locking issues outside sysfs :) The mutex you're fiddling with is only in sysfs, but the routines *protected* by that mutex are used in other places, too. So what happens when a hotplug driver does a rescan at the same time a user does a rescan or remove via sysfs? I don't even know what the rules are for protecting scan/remove, but I don't have confidence that the issue you're fixing is the only one. Bjorn >>> } >>> >>> static ssize_t >>> @@ -366,12 +373,15 @@ dev_bus_rescan_store(struct device *dev, struct > device_attribute *attr, >>> return -EINVAL; >>> >>> if (val) { >>> - mutex_lock(&pci_remove_rescan_mutex); >>> - if (!pci_is_root_bus(bus) && list_empty(&bus->devices)) >>> - pci_rescan_bus_bridge_resize(bus->self); >>> - else >>> - pci_rescan_bus(bus); >>> - mutex_unlock(&pci_remove_rescan_mutex); >>> + if (mutex_trylock(&pci_remove_rescan_mutex)) { >>> + if (!pci_is_root_bus(bus) && list_empty(&bus->devices)) >>> + pci_rescan_bus_bridge_resize(bus->self); >>> + else >>> + pci_rescan_bus(bus); >>> + mutex_unlock(&pci_remove_rescan_mutex); >>> + } else { >>> + return 0; >>> + } >>> } >>> return count; >>> } >>> -- >>> 1.7.1 >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-pci" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html