On Sat, Feb 23, 2013 at 07:13:11PM -0800, Yinghai Lu wrote: > On Sat, Feb 23, 2013 at 4:20 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote: > > I think this is a general object lifetime issue that really has > > nothing to do with ASPM except that ASPM happens to be the victim. > > > > You're doing this: > > > > echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 > >> /sys/bus/pci/devices/0000\:1a\:01.0/remove > > > > The 1a:01.0 device is downstream from the 10:00.0 bridge. The sysfs > > interface remove_store() uses device_schedule_callback() to schedule > > the remove for later. I think what's happening is that we schedule > > remove_callback() for both devices before 10:00.0 has been removed, > > like this: > > > > # echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove > > remove_store # for 10:00.0 > > device_schedule_callback(10:00.0, remove_callback) > > sysfs_schedule_callback > > kobject_get > > queue_work > > # echo -n 1 > /sys/bus/pci/devices/0000\:1a\:01.0/remove > > remove_store # for 1a:01.0 > > device_schedule_callback(1a:01.0, remove_callback) > > sysfs_schedule_callback > > kobject_get > > queue_work > > > > Note that we acquire a reference on each pci_dev before queuing the work item. > > > > Later, we run the callbacks, starting with 10:00.0. This calls > > remove_callback() to perform the remove: > > > > remove_callback(10:00.0) > > mutex_lock(&pci_remove_rescan_mutex) > > pci_stop_and_remove_bus_device(pdev) > > mutex_unlock(&pci_remove_rescan_mutex) > > > > This will stop and remove the subtree below 10:00.0, but it does not > > actually free the pci_dev for 1a:01.0 because we increased its ref > > count in sysfs_schedule_callback. So after completing > > remove_callback(10:00.0), we run the second callback for 1a:01.0. > > > > The remove for 1a:01.0 calls pcie_aspm_exit_link_state() from > > pci_stop_dev(). This is where we blow up because, according to your > > debugging, pdev->bus->self is no longer valid. > > > > The PCI core did this removal wrong. If we have a valid pci_dev > > pointer, as we do in pcie_aspm_exit_link_state(), the whole object > > ought to be valid. But the PCI core deallocated the struct pci_bus > > for bus 0000:1a too soon. > > > > My guess is that when we build a pci_dev, we need to increase the ref > > count on the pci_bus where that pci_dev lives. That way we can keep > > around all the buses and bridges leading from the root to the device > > in question. > > Agreed, Please check if attached is what you want. I included the patch directly below for convenience. > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index b494066..6df5428 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1337,6 +1337,7 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) > */ > down_write(&pci_bus_sem); > list_add_tail(&dev->bus_list, &bus->devices); > + get_device(&bus->dev); > up_write(&pci_bus_sem); This suggests that taking the reference must be protected by pci_bus_sem, but I don't think that's the case, is it? > pci_fixup_device(pci_fixup_final, dev); > diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c > index cc875e6..a72b700 100644 > --- a/drivers/pci/remove.c > +++ b/drivers/pci/remove.c > @@ -36,6 +36,7 @@ static void pci_destroy_dev(struct pci_dev *dev) > { > down_write(&pci_bus_sem); > list_del(&dev->bus_list); > + put_device(&dev->bus->dev); > up_write(&pci_bus_sem); > > pci_free_resources(dev); The problem is that we capture the struct pci_bus pointer without taking a reference on the bus object. The capture occurs in pci_scan_device() (and other callers of alloc_pci_dev()) where we do this: dev = alloc_pci_dev(); dev->bus = bus; To make code analysis easier, I think we should take the reference at the exact point where we capture the pointer, using something like: dev->bus = pci_bus_get(bus); It'd probably be even better to deprecate alloc_pci_dev() and make a pci_alloc_dev(struct pci_bus *bus) that takes care of acquiring the reference itself. But I don't think this takes care of the whole issue. What protects the pci_scan_slot() path against concurrent removal of the pci_bus? I don't see anything in the hotplug drivers or in the pci_scan_child_bus() path that acquires a reference or does locking for this. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html