On Mon, Apr 09, 2018 at 04:04:44PM -0600, Keith Busch wrote: > The side effects of surprise removal may trigger AER handling. The AER > handling walks the pci topology and may access a pci_dev that is being > freed by the hotplug handler. > > This patch fixes that use-after-free by locking the PCI topology in > the AER handler so it isn't racing with the pciehp removal. > > Since the AER handler now runs under a global PCI lock, the rpc specific > mutex is no longer necessary. > > Reported-by: Alex Gagniuc <Alex_Gagniuc@xxxxxxxxxxxx> > Signed-off-by: Keith Busch <keith.busch@xxxxxxxxx> > --- > drivers/pci/pcie/aer/aerdrv.c | 1 - > drivers/pci/pcie/aer/aerdrv.h | 5 ----- > drivers/pci/pcie/aer/aerdrv_core.c | 4 ++-- > 3 files changed, 2 insertions(+), 8 deletions(-) > > diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c > index 0b2eb88c422b..b88e5e2f3700 100644 > --- a/drivers/pci/pcie/aer/aerdrv.c > +++ b/drivers/pci/pcie/aer/aerdrv.c > @@ -237,7 +237,6 @@ static struct aer_rpc *aer_alloc_rpc(struct pcie_device *dev) > rpc->rpd = pci_dev_get(dev->port); > kref_init(&rpc->ref); > INIT_WORK(&rpc->dpc_handler, aer_isr); > - mutex_init(&rpc->rpc_mutex); > > /* Use PCIe bus function to store rpc into PCIe device */ > set_service_data(dev, rpc); > diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h > index f886521e2c7b..b90fc5d4cda2 100644 > --- a/drivers/pci/pcie/aer/aerdrv.h > +++ b/drivers/pci/pcie/aer/aerdrv.h > @@ -70,11 +70,6 @@ struct aer_rpc { > * Lock access to Error Status/ID Regs > * and error producer/consumer index > */ > - struct mutex rpc_mutex; /* > - * only one thread could do > - * recovery on the same > - * root port hierarchy > - */ > }; > > struct aer_broadcast_data { > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c > index e4059d7fa7fa..de210b7439eb 100644 > --- a/drivers/pci/pcie/aer/aerdrv_core.c > +++ b/drivers/pci/pcie/aer/aerdrv_core.c > @@ -796,10 +796,10 @@ void aer_isr(struct work_struct *work) > struct aer_rpc *rpc = container_of(work, struct aer_rpc, dpc_handler); > struct aer_err_source uninitialized_var(e_src); > > - mutex_lock(&rpc->rpc_mutex); > + pci_lock_rescan_remove(); > while (get_e_source(rpc, &e_src)) > aer_isr_one_error(rpc, &e_src); > - mutex_unlock(&rpc->rpc_mutex); > + pci_unlock_rescan_remove(); I think this needs to be updated after Oza's patches, doesn't it? It looks like this would deadlock if I applied it to my current "next" branch as-is: aer_isr pci_lock_rescan_remove aer_isr_one_error aer_process_err_devices handle_error_source pcie_do_fatal_recovery pci_lock_rescan_remove <-- deadlock > aer_release(rpc); > } > -- > 2.14.3 >