Re: [PATCH] PCI: Fix use-after-free of slot->bus on hot remove

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Thu, Oct 10, 2024 at 07:10:34PM +0200, Lukas Wunner wrote:
> Dennis reports a boot crash on recent Lenovo laptops with a USB4 dock.
> 
> Since commit 0fc70886569c ("thunderbolt: Reset USB4 v2 host router") and
> commit 59a54c5f3dbd ("thunderbolt: Reset topology created by the boot
> firmware"), USB4 v2 and v1 Host Routers are reset on probe of the
> thunderbolt driver.
> 
> The reset clears the Presence Detect State and Data Link Layer Link Active
> bits at the USB4 Host Router's Root Port and thus causes hot removal of
> the dock.

Can't this happen also simply unplug at some part of the PCIe topology?
I don't think this is specific to TB/USB4.

> The crash occurs when pciehp is unbound from one of the dock's Downstream
> Ports:  pciehp creates a pci_slot on bind and destroys it on unbind.  The
> pci_slot contains a pointer to the pci_bus below the Downstream Port, but
> a reference on that pci_bus is never acquired.  The pci_bus is destroyed
> before the pci_slot, so a use-after-free ensues when pci_slot_release()
> accesses slot->bus.
> 
> In principle this should not happen because pci_stop_bus_device() unbinds
> pciehp (and therefore destroys the pci_slot) before the pci_bus is
> destroyed by pci_remove_bus_device().
> 
> However the stacktrace provided by Dennis shows that pciehp is unbound
> from pci_remove_bus_device() instead of pci_stop_bus_device().
> To understand the significance of this, one needs to know that the PCI
> core uses a two step process to remove a portion of the hierarchy:  It
> first unbinds all drivers in the sub-hierarchy in pci_stop_bus_device()
> and then actually removes the devices in pci_remove_bus_device().
> There is no precaution to prevent driver binding in-between
> pci_stop_bus_device() and pci_remove_bus_device().
> 
> In Dennis' case, it seems removal of the hierarchy by pciehp races with
> driver binding by pci_bus_add_devices().  pciehp is bound to the
> Downstream Port after pci_stop_bus_device() has run, so it is unbound by
> pci_remove_bus_device() instead of pci_stop_bus_device().  Because the
> pci_bus has already been destroyed at that point, accesses to it result in
> a use-after-free.
> 
> One might conclude that driver binding needs to be prevented after
> pci_stop_bus_device() has run.  However it seems risky that pci_slot
> points to pci_bus without holding a reference.  Solely relying on correct
> ordering of driver unbind versus pci_bus destruction is certainly not
> defensive programming.
> 
> If pci_slot has a need to access data in pci_bus, it ought to acquire a
> reference.  Amend pci_create_slot() accordingly.  Dennis reports that the
> crash is not reproducible with this change.
> 
> Abridged stacktrace:
> 
>   pcieport 0000:00:07.0: PME: Signaling with IRQ 156
>   pcieport 0000:00:07.0: pciehp: Slot #12 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
>   pci_bus 0000:20: dev 00, created physical slot 12
>   pcieport 0000:00:07.0: pciehp: Slot(12): Card not present
>   ...
>   pcieport 0000:21:02.0: pciehp: pcie_disable_notification: SLOTCTRL d8 write cmd 0
>   Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
>   CPU: 13 UID: 0 PID: 134 Comm: irq/156-pciehp Not tainted 6.11.0-devel+ #1
>   RIP: 0010:dev_driver_string+0x12/0x40
>   pci_destroy_slot
>   pciehp_remove
>   pcie_port_remove_service
>   device_release_driver_internal
>   bus_remove_device
>   device_del
>   device_unregister
>   remove_iter
>   device_for_each_child
>   pcie_portdrv_remove
>   pci_device_remove
>   device_release_driver_internal
>   bus_remove_device
>   device_del
>   pci_remove_bus_device (recursive invocation)
>   pci_remove_bus_device
>   pciehp_unconfigure_device
>   pciehp_disable_slot
>   pciehp_handle_presence_or_link_change
>   pciehp_ist
> 
> Reported-by: Dennis Wassenberg <Dennis.Wassenberg@xxxxxxxxxxx>
> Tested-by: Dennis Wassenberg <Dennis.Wassenberg@xxxxxxxxxxx>
> Closes: https://lore.kernel.org/r/6de4b45ff2b32dd91a805ec02ec8ec73ef411bf6.camel@xxxxxxxxxxx/
> Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx>

Reviewed-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux