Re: Commit ef83b0781a73f (PCI: Remove from bus_list and release resources in pci_release_dev()) broke TBT hotplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Friday, January 31, 2014 12:53:01 PM Mika Westerberg wrote:
> On Fri, Jan 31, 2014 at 01:38:42AM +0100, Rafael J. Wysocki wrote:
> > On Friday, January 31, 2014 12:59:06 AM Rafael J. Wysocki wrote:
> > > On Thursday, January 30, 2014 03:39:02 PM Yinghai Lu wrote:
> > > > On Thu, Jan 30, 2014 at 3:39 PM, Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
> > > > > On Thursday, January 30, 2014 08:56:05 AM Yinghai Lu wrote:
> > > > >>
> > > > >> --047d7b5d2ea4eb937804f132eedf
> > > > >> Content-Type: text/plain; charset=ISO-8859-1
> > > > >>
> > > > >> >> The latest mainline kernel "hangs" when Thunderbolt devices are
> > > > >> >> hot-unplugged to the system. I can't see any oops but after hot-unplug I'm
> > > > >> >> getting huge amounts of messages like:
> > > > >> >>
> > > > >> >> [  352.717001] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717011] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717021] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717032] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717041] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717051] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717061] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717070] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717083] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717094] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717104] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717113] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717124] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717133] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717143] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717153] pci 0000:02:00.0: PME# disabled
> > > > >> >> [  352.717162] pci 0000:02:00.0: PME# disabled
> > > > >> >
> > > > >> > that mean pci_stop_dev() get called again and again ?
> > > > >>
> > > > >> please check if attached patch could help.
> > > > >
> > > > > Well, it looks like what happens is an endless loop in
> > > > > acpiphp_glue.c:disable_slot().
> > > > >
> > > > > dev_in_slot() returns the first device in the list, so
> > > > > pci_stop_and_remove_bus_device() is called for it, but it
> > > > > doesn't remove the device from bus->devices any more, so
> > > > > dev_in_slot() will return the same device next time and
> > > > > so on forever.
> > > > >
> > > > ...
> > > > >
> > > > > So the above won't help in my opinion.
> > > > >
> > > > > I wonder, however, if this patch helps instead:
> > > > >
> > > > > https://patchwork.kernel.org/patch/3540701/
> > > > >
> > > > > I thought it would be 3.15 material, but it very well can go in earlier if
> > > > > it happens to address this particular problem.
> > > > 
> > > > Agree, that should fix the problem.
> > > > 
> > > > but please use list_for_each_entry_safe_reverse
> > > > instead.
> > > 
> > > OK, I will.
> > 
> > Mika, below is an updated patch to try.
> > 
> > ---
> > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > Subject: ACPI / hotplug / PCI: Simplify disable_slot()
> > 
> > After recent PCI core changes related to the rescan/remove locking,
> > the ACPIPHP's disable_slot() function is only called under the
> > general PCI rescan/remove lock, so it doesn't have to use
> > dev_in_slot() any more to avoid race conditions.  Make it simply
> > walk the devices on the bus and drop the ones in the slot being
> > disabled and drop dev_in_slot() which has no more users.
> > 
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> 
> Thanks for the fix.
> 
> Unfortunately, it now crashes here after I re-plug the TBT chain (I have
> both of your patches applied):
> 
> int sysfs_create_bin_file(struct kobject *kobj,
>                           const struct bin_attribute *attr)
> {
> 	BUG_ON(!kobj || !kobj->sd || !attr); <--
> 
> Since I don't have proper serial console to that machine, all I see is the
> end of the backtrace :-(
> 
> Here is a hand copied backtrace from the screen:
> 
> pci_create_sysfs_dev_files()
> pci_bus_add_device()
> pci_bus_add_devices()
> enable_slot()
> acpiphp_check_bridge()
> hotplug_event()
> ...

So I think what happens is that we leak the struct pci_dev during removal and
the proper cleanup is never done.

Can you please add a debug printk into pci_release_dev() and see if that's
ever called after TBT unplug?

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux