A PCI device hot removal may occur while stdev->cdev is held open. The call
to stdev_release() then happens during close or exit, at a point way past
switchtec_pci_remove(). Otherwise the last ref would vanish with the
trailing put_device(), just before return.
At that later point in time, the devm cleanup has already removed the
stdev->mmio_mrpc mapping. Also, the stdev->pdev reference was not a counted
one. Therefore, in DMA mode, the iowrite32() in stdev_release() will cause
a fatal page fault, and the subsequent dma_free_coherent(), if reached,
would pass a stale &stdev->pdev->dev pointer.
Fix by moving MRPC DMA shutdown into switchtec_pci_remove(), after
stdev_kill(). Counting the stdev->pdev ref is now optional, but may prevent
future accidents.
Reproducible via the script at
https://lore.kernel.org/r/20231113212150.96410-1-dns@xxxxxxxxxx
Link: https://lore.kernel.org/r/20231113212150.96410-2-dns@xxxxxxxxxx
Signed-off-by: Daniel Stodden <dns@xxxxxxxxxx>
Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
Reviewed-by: Logan Gunthorpe <logang@xxxxxxxxxxxx>
Reviewed-by: Dmitry Safonov <dima@xxxxxxxxxx>
---
...
@@ -1703,6 +1709,9 @@ static void switchtec_pci_remove(struct pci_dev *pdev) > ida_free(&switchtec_minor_ida, MINOR(stdev->dev.devt));
dev_info(&stdev->dev, "unregistered.\n");
stdev_kill(stdev);
+ switchtec_exit_pci(stdev); > + pci_dev_put(stdev->pdev); > + stdev->pdev = NULL; > put_device(&stdev->dev);
}
Hi,
does a similarswitchtec_exit_pci() should be called in the error handling path of switchtec_pci_probe() if an error occurs after switchtec_init_pci()?
CJ