On Mon, Sep 24, 2018 at 06:43:07PM -0500, Bjorn Helgaas wrote: > On Sat, Sep 22, 2018 at 08:10:57PM +0200, Lukas Wunner wrote: > > This cannot happen with what's in v4.19 or on the pci/hotplug branch > > because the IRQ is released right after the call to pci_hp_del(). > > If the button_work happens to be scheduled afterwards, it will call > > irq_wake_thread(ctrl->pcie->irq, ctrl). Now if you look at the > > implementation of irq_wake_thread(), it iterates over all the > > actions of the IRQ (in case it's shared by multiple devices) and > > finds the action which matches the ctrl pointer. But because we've > > already released the IRQ at that point, it won't find the IRQ action > > and just return. So irq_wake_thread() essentially becomes a no-op. > > This seems like a very subtle issue that we're likely to trip over > again. I assume most drivers can use device-managed IRQs safely, but > it's a problem here because of the extra complexity around > hotplug_slot? Is there some way we can make it less subtle, e.g., by > moving the slot cleanup into the pci_destroy_dev() path or something? The extra complexity is caused by the various ways in which events can be triggered (Slot Status register change, sysfs, button_work) plus the deregistration of hotplug_slot. Teardown paths are always hard, I don't really see a better code solution, but I should add more code comments to make it less subtle. > Is there some value we get from having both struct hotplug_slot and > struct pci_slot? When Alex Chiang introduced struct pci_slot, he thought of migrating struct hotplug_slot into it. See the "migrate over time" code comment in commit f46753c5e354. (It got changed to "move here" by 0aa0f5d1084c.) On the current pci/hotplug branch, I've embedded struct hotplug_slot in each hotplug driver's struct slot. But that doesn't preclude merging struct hotplug_slot into struct pci_slot, the hotplug drivers would just have to embed a struct pci_slot instead, or struct pci_slot would be embedded in struct hotplug_slot. TBH the rationale behind all the different structs isn't quite clear to me, e.g. struct pci_slot is also used for non-hotplug slots it seems, see 8344b568f5bd. (Note, Alex is no longer reachable under his hp.com address, same for his canonical.com address.) > > While I had thought of using devm_kzalloc(), I also knew that Bjorn > > wants to get rid of portdrv and I didn't want to make any changes that > > might constrain the future design of portdrv. Right now, portdrv > > creates a sub-device for each service and service drivers bind to it. > > An alternative design could either be a single driver that binds to > > ports and handles all services at once (probably wouldn't be all that > > different to the current situation), or no driver would be bound to > > ports and the functionality of the port services would be handled > > within the core. In either case we have the pci_dev of the port to > > attach the device-managed allocations to, so I suppose using > > devm_kzalloc() here isn't constraining us all that much, hence > > seems fine. > > I'd *like* to get rid of portdrv, but I don't see a clear path to do > that yet, so it might be impractical. One reason I don't like it is > that only one driver can bind to a device, so portdrv prevents us from > having drivers for device-specific bridge features like performance > monitors. For that reason, I'd like to have the architected services > handled directly in the core. Hm, well in that case we can't use devm_*() at all, as Benjamin Herrenschmidt has just correctly pointed out, devres_release_all() is called both on driver unbind in __device_release_driver() as well as on device destruction in device_release(). Thus, if the port services are handled in the core to allow a driver to bind to the port and devm_kzalloc() is used by the port services, those allocations will be freed when that driver unbinds, and any further use of the allocations is a use-after-free. Thanks, Lukas