Re: UAF during boot on MTL based devices with attached dock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 03, 2024 at 03:46:55PM +0200, Lukas Wunner wrote:
> On Wed, Sep 25, 2024 at 03:38:34PM +0000, Wassenberg, Dennis wrote:
> > [    2.858063] Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
> > [    2.858071] CPU: 13 UID: 0 PID: 137 Comm: irq/156-pciehp Not tainted 6.11.0-devel+ #3
> > [    2.858090] Hardware name: LENOVO 21LVS1CV00/21LVS1CV00, BIOS N45ET18W (1.08 ) 07/08/2024
> > [    2.858097] RIP: 0010:dev_driver_string+0x12/0x40
> > [    2.858111] Code: 5c c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 68 48 85 c0 74 08 <48> 8b 00 c3 cc cc cc cc 48 8b 47 60 48 85 c0 75 ef 48 8b 97 a8 02
> > [    2.858123] RSP: 0000:ffff9493009cfa00 EFLAGS: 00010202
> > [    2.858132] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8e53029cb918 RCX: 0000000000000000
> > [    2.858139] RDX: ffffffffa586b18a RSI: ffff8e53029cb918 RDI: ffff8e53029cb918
> > [    2.858144] RBP: ffff9493009cfb10 R08: 0000000000000000 R09: ffff8e5304f61000
> > [    2.858150] R10: ffff9493009cfb20 R11: 0000000000005627 R12: ffffffffa64db188
> > [    2.858156] R13: 6b6b6b6b6b6b6b6b R14: 0000000000000080 R15: ffff8e5302b1c0c0
> > [    2.858161] FS:  0000000000000000(0000) GS:ffff8e5a50140000(0000) knlGS:0000000000000000
> > [    2.858169] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    2.858175] CR2: 0000000000000000 CR3: 000000030162e001 CR4: 0000000000f70ef0
> > [    2.858182] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [    2.858187] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
> > [    2.858193] PKRU: 55555554
> > [    2.858196] Call Trace:
> [...]
> > [    2.858258]  __dynamic_dev_dbg+0x170/0x210
> > [    2.858287]  pci_destroy_slot+0x59/0x60
> > [    2.858296]  pciehp_remove+0x2e/0x50
> > [    2.858304]  pcie_port_remove_service+0x30/0x50
> > [    2.858311]  device_release_driver_internal+0x19f/0x200
> > [    2.858322]  bus_remove_device+0xc6/0x130
> > [    2.858335]  device_del+0x165/0x3f0
> > [    2.858348]  device_unregister+0x17/0x60
> > [    2.858355]  remove_iter+0x1f/0x30
> > [    2.858361]  device_for_each_child+0x6a/0xb0
> > [    2.858368]  pcie_portdrv_remove+0x2f/0x60
> > [    2.858374]  pci_device_remove+0x3f/0xa0
> > [    2.858383]  device_release_driver_internal+0x19f/0x200
> > [    2.858392]  bus_remove_device+0xc6/0x130
> > [    2.858398]  device_del+0x165/0x3f0
> > [    2.858413]  pci_remove_bus_device+0x91/0x140
> > [    2.858422]  pci_remove_bus_device+0x3e/0x140
> > [    2.858430]  pciehp_unconfigure_device+0x98/0x160
> > [    2.858439]  pciehp_disable_slot+0x69/0x130
> > [    2.858447]  pciehp_handle_presence_or_link_change+0x281/0x4c0
> > [    2.858456]  pciehp_ist+0x14a/0x150
> 
> Could you try the patch below and report back if it fixes the issue?

So the patch I sent you yesterday was based on the insight that
in the stack trace above, pci_destroy_slot() accesses slot->bus
after bus has been destroyed.

The slot is not a child of bus, so doesn't implicitly hold a reference
on bus.  The only kobjects holding references on bus are the child
devices of the bus, and they're destroyed before the bus gets destroyed.
The logical conclusion was that slot needs to hold a reference on bus
to avoid a use-after-free.

However looking at the stacktrace with a fresh pair of eyeballs,
I notice there's an oddity here:  The kernel normally unbinds drivers
from PCI devices in pci_stop_dev().  The actual removal of the devices
happens in a separate step, pci_remove_bus_device().

I note that in your stacktrace, driver unbinding happens from
pci_remove_bus_device().  That's unexpected.  The driver should
have already been unbound earlier in pci_stop_dev().

Perhaps the pcieport driver was rebound to the device after it was
already unbound.  The patch below should prevent that.

There are no messages though that would prove rebinding of a hotplug port.

The unplug event happens at the top of the hierarchy (below the Root Port).
So pci_bus_add_devices() binds the Root Port, its driver starts stopping
and removing the hierarchy below, all the while pci_bus_add_devices()
continues binding drivers to the child devices.

Could you try this patch (in addition to the one below and to the one
I sent yesterday):

https://lore.kernel.org/all/20241003084342.27501-1-brgl@xxxxxxxx/

It should prevent pci_bus_add_devices() from racing with pciehp stopping
and removing devices.

BTW, are you using a stock kernel or do you have any custom patches
(such as grsecurity) applied on top which might lead to a different
behavior than a stock kernel?

Thanks,

Lukas

-- >8 --

diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index 387db92..1b38d79 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -36,6 +36,7 @@ static void pci_stop_dev(struct pci_dev *dev)
 	if (pci_dev_is_added(dev)) {
 		device_for_each_child(dev->dev.parent, dev_of_node(&dev->dev),
 				      pci_pwrctl_unregister);
+		dev->match_driver = false;
 		device_release_driver(&dev->dev);
 		pci_proc_detach_device(dev);
 		pci_remove_sysfs_dev_files(dev);




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux