Re: [PATCH 12/12] PCI/pciehp: Use device managed allocations

Bjorn Helgaas <helgaas@xxxxxxxxxx> · Mon, 24 Sep 2018 18:43:07 -0500

On Sat, Sep 22, 2018 at 08:10:57PM +0200, Lukas Wunner wrote:
> On Tue, Sep 18, 2018 at 05:58:48PM -0600, Keith Busch wrote:
> > --- a/drivers/pci/hotplug/pciehp_core.c
> > +++ b/drivers/pci/hotplug/pciehp_core.c

> >  	/* Installs the interrupt handler */
> > -	retval = request_threaded_irq(irq, pciehp_isr, pciehp_ist,
> > -				      IRQF_SHARED, MY_NAME, ctrl);
> > +	retval = devm_request_threaded_irq(&ctrl->pcie->device, irq, pciehp_isr,
> > +					   pciehp_ist, IRQF_SHARED, MY_NAME,
> > +					   ctrl);
> 
> While using devm_kzalloc() for the "ops" and "ctrl" allocations is fine,
> using devm_request_threaded_irq() is *not* because it allows a
> use-after-free of the hotplug_slot_name():
> 
> With your patch the teardown order becomes:
>   pci_hp_del(&ctrl->hotplug_slot);       # After this user space can no longer
>                                          # issue enable/disable requests (but
>                                          # ->reset_slot() is still possible).
>   pcie_disable_notification();           # After this, the IRQ thread is no
>                                          # longer woken because of slot events.
>   pci_hp_destroy(hotplug_slot);          # After this, hotplug_slot_name() must
>                                          # no longer be called.
>   cancel_delayed_work_sync(&slot->work); # After this, the IRQ thread is no
>                                          # longer woken by the button_work.
> 
> What can happen here is the button_work gets scheduled before it is
> canceled, it wakes up the IRQ thread, the IRQ thread brings up or
> down the slot and prints messages to the log which include the
> hotplug_slot_name(), leading to a use-after-free.
> 
> This cannot happen with what's in v4.19 or on the pci/hotplug branch
> because the IRQ is released right after the call to pci_hp_del().
> If the button_work happens to be scheduled afterwards, it will call
> irq_wake_thread(ctrl->pcie->irq, ctrl).  Now if you look at the
> implementation of irq_wake_thread(), it iterates over all the
> actions of the IRQ (in case it's shared by multiple devices) and
> finds the action which matches the ctrl pointer.  But because we've
> already released the IRQ at that point, it won't find the IRQ action
> and just return.  So irq_wake_thread() essentially becomes a no-op.
> 
> The teardown order is very meticulously designed, I went over it
> dozens of times to ensure it's bullet-proof.  The IRQ really has to
> be released at exactly the right time and using a device-managed IRQ
> breaks the correct order I'm afraid.

Thanks for the very detailed analysis!  

This seems like a very subtle issue that we're likely to trip over
again.  I assume most drivers can use device-managed IRQs safely, but
it's a problem here because of the extra complexity around
hotplug_slot?  Is there some way we can make it less subtle, e.g., by
moving the slot cleanup into the pci_destroy_dev() path or something?

Is there some value we get from having both struct hotplug_slot and
struct pci_slot?

> While I had thought of using devm_kzalloc(), I also knew that Bjorn
> wants to get rid of portdrv and I didn't want to make any changes that
> might constrain the future design of portdrv.  Right now, portdrv
> creates a sub-device for each service and service drivers bind to it.
> An alternative design could either be a single driver that binds to
> ports and handles all services at once (probably wouldn't be all that
> different to the current situation), or no driver would be bound to
> ports and the functionality of the port services would be handled
> within the core.  In either case we have the pci_dev of the port to
> attach the device-managed allocations to, so I suppose using
> devm_kzalloc() here isn't constraining us all that much, hence
> seems fine.

I'd *like* to get rid of portdrv, but I don't see a clear path to do
that yet, so it might be impractical.  One reason I don't like it is
that only one driver can bind to a device, so portdrv prevents us from
having drivers for device-specific bridge features like performance
monitors.  For that reason, I'd like to have the architected services
handled directly in the core.

Bjorn