On Sat, Sep 22, 2018 at 08:10:57PM +0200, Lukas Wunner wrote: > On Tue, Sep 18, 2018 at 05:58:48PM -0600, Keith Busch wrote: > > --- a/drivers/pci/hotplug/pciehp_core.c > > +++ b/drivers/pci/hotplug/pciehp_core.c > > /* Installs the interrupt handler */ > > - retval = request_threaded_irq(irq, pciehp_isr, pciehp_ist, > > - IRQF_SHARED, MY_NAME, ctrl); > > + retval = devm_request_threaded_irq(&ctrl->pcie->device, irq, pciehp_isr, > > + pciehp_ist, IRQF_SHARED, MY_NAME, > > + ctrl); > > While using devm_kzalloc() for the "ops" and "ctrl" allocations is fine, > using devm_request_threaded_irq() is *not* because it allows a > use-after-free of the hotplug_slot_name(): > > With your patch the teardown order becomes: > pci_hp_del(&ctrl->hotplug_slot); # After this user space can no longer > # issue enable/disable requests (but > # ->reset_slot() is still possible). > pcie_disable_notification(); # After this, the IRQ thread is no > # longer woken because of slot events. > pci_hp_destroy(hotplug_slot); # After this, hotplug_slot_name() must > # no longer be called. > cancel_delayed_work_sync(&slot->work); # After this, the IRQ thread is no > # longer woken by the button_work. > > What can happen here is the button_work gets scheduled before it is > canceled, it wakes up the IRQ thread, the IRQ thread brings up or > down the slot and prints messages to the log which include the > hotplug_slot_name(), leading to a use-after-free. > > This cannot happen with what's in v4.19 or on the pci/hotplug branch > because the IRQ is released right after the call to pci_hp_del(). > If the button_work happens to be scheduled afterwards, it will call > irq_wake_thread(ctrl->pcie->irq, ctrl). Now if you look at the > implementation of irq_wake_thread(), it iterates over all the > actions of the IRQ (in case it's shared by multiple devices) and > finds the action which matches the ctrl pointer. But because we've > already released the IRQ at that point, it won't find the IRQ action > and just return. So irq_wake_thread() essentially becomes a no-op. > > The teardown order is very meticulously designed, I went over it > dozens of times to ensure it's bullet-proof. The IRQ really has to > be released at exactly the right time and using a device-managed IRQ > breaks the correct order I'm afraid. Thanks for the very detailed analysis! This seems like a very subtle issue that we're likely to trip over again. I assume most drivers can use device-managed IRQs safely, but it's a problem here because of the extra complexity around hotplug_slot? Is there some way we can make it less subtle, e.g., by moving the slot cleanup into the pci_destroy_dev() path or something? Is there some value we get from having both struct hotplug_slot and struct pci_slot? > While I had thought of using devm_kzalloc(), I also knew that Bjorn > wants to get rid of portdrv and I didn't want to make any changes that > might constrain the future design of portdrv. Right now, portdrv > creates a sub-device for each service and service drivers bind to it. > An alternative design could either be a single driver that binds to > ports and handles all services at once (probably wouldn't be all that > different to the current situation), or no driver would be bound to > ports and the functionality of the port services would be handled > within the core. In either case we have the pci_dev of the port to > attach the device-managed allocations to, so I suppose using > devm_kzalloc() here isn't constraining us all that much, hence > seems fine. I'd *like* to get rid of portdrv, but I don't see a clear path to do that yet, so it might be impractical. One reason I don't like it is that only one driver can bind to a device, so portdrv prevents us from having drivers for device-specific bridge features like performance monitors. For that reason, I'd like to have the architected services handled directly in the core. Bjorn