> On Dec 7, 2023, at 1:08 PM, Christophe JAILLET <christophe.jaillet@xxxxxxxxxx> wrote: > >> A PCI device hot removal may occur while stdev->cdev is held open. The call >> to stdev_release() then happens during close or exit, at a point way past >> switchtec_pci_remove(). Otherwise the last ref would vanish with the >> trailing put_device(), just before return. >> At that later point in time, the devm cleanup has already removed the >> stdev->mmio_mrpc mapping. Also, the stdev->pdev reference was not a counted >> one. Therefore, in DMA mode, the iowrite32() in stdev_release() will cause >> a fatal page fault, and the subsequent dma_free_coherent(), if reached, >> would pass a stale &stdev->pdev->dev pointer. >> Fix by moving MRPC DMA shutdown into switchtec_pci_remove(), after >> stdev_kill(). Counting the stdev->pdev ref is now optional, but may prevent >> future accidents. >> Reproducible via the script at >> https://lore.kernel.org/r/20231113212150.96410-1-dns@xxxxxxxxxx >> Link: https://lore.kernel.org/r/20231113212150.96410-2-dns@xxxxxxxxxx >> Signed-off-by: Daniel Stodden <dns@xxxxxxxxxx> >> Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> >> Reviewed-by: Logan Gunthorpe <logang@xxxxxxxxxxxx> >> Reviewed-by: Dmitry Safonov <dima@xxxxxxxxxx> > --- > > ... > >> @@ -1703,6 +1709,9 @@ static void switchtec_pci_remove(struct pci_dev *pdev) > ida_free(&switchtec_minor_ida, MINOR(stdev->dev.devt)); >> dev_info(&stdev->dev, "unregistered.\n"); >> stdev_kill(stdev); >> + switchtec_exit_pci(stdev); > + pci_dev_put(stdev->pdev); > + stdev->pdev = NULL; > put_device(&stdev->dev); >> } > Hi, > > does a similarswitchtec_exit_pci() should be called in the error handling path of switchtec_pci_probe() if an error occurs after switchtec_init_pci()? > Yep, that is correct. Looks like this is actually another regression resulting from evacuating stdev_release. Previous code could rely on the trailing put_device(), past err_put. No more. Cheers, Daniel