On Thu, 2020-06-25 at 14:58 -0500, Bjorn Helgaas wrote: > [+cc Thomas] > > On Thu, Jun 25, 2020 at 12:24:49PM -0400, Jon Derrick wrote: > > From: Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx> > > > > The VMD domain does not subscribe to ACPI, and so does not operate on > > it's irqdomain fwnode. It was freeing the handle after allocation of the > > domain. As of 181e9d4efaf6a (irqdomain: Make __irq_domain_add() less > > OF-dependent), the fwnode is put during irq_domain_remove causing a page > > fault. This patch keeps VMD's fwnode allocated through the lifetime of > > the VMD irqdomain. > > > > Fixes: ae904cafd59d ("PCI/vmd: Create named irq domain") > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx> > > Co-developed-by: Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx> > > Signed-off-by: Jon Derrick <jonathan.derrick@xxxxxxxxx> > > --- > > Hi Lorenzo, Bjorn, > > > > Please take this patch for v5.8 fixes. It fixes an issue during VMD > > unload. > > I tentatively applied this to for-linus for v5.8. > > But I would like to clarify the commit log. It says this fixes > Thomas' ae904cafd59d ("PCI/vmd: Create named irq domain"). That > appeared in v4.13, which suggests that this patch should be backported > to v4.13 and later. > > But it's not clear to me that ae904cafd59d is actually broken, since > the log also says the problem appeared after 181e9d4efaf6 ("irqdomain: > Make __irq_domain_add() less OF-dependent"), which we just merged for > v5.8-rc1. > > And obviously, freeing the fwnode doesn't *cause* a page fault. A > use-after-free might cause a fault, but it's not clear where that > happens. I guess fwnode is used in the interval between: > > vmd_enable_domain > pci_msi_create_irq_domain > > ... <-- fwnode used here somewhere > > vmd_remove > vmd_cleanup_srcu > irq_domain_free_fwnode > > But I can't tell how 181e9d4efaf6a and/or ae904cafd59d are related to > that. The actual issue is that domain->fwnode was freed (and not set NULL), leading to page fault here: void irq_domain_remove(struct irq_domain *domain) { ... fwnode_handle_put(domain->fwnode); But it's not obvious to me that VMD has a reason to free fwnode if there's other deps that rely on it existing > > > drivers/pci/controller/vmd.c | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > > index e386d4eac407..ebec0a6e77ed 100644 > > --- a/drivers/pci/controller/vmd.c > > +++ b/drivers/pci/controller/vmd.c > > @@ -546,9 +546,10 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) > > > > vmd->irq_domain = pci_msi_create_irq_domain(fn, &vmd_msi_domain_info, > > x86_vector_domain); > > - irq_domain_free_fwnode(fn); > > - if (!vmd->irq_domain) > > + if (!vmd->irq_domain) { > > + irq_domain_free_fwnode(fn); > > return -ENODEV; > > + } > > > > pci_add_resource(&resources, &vmd->resources[0]); > > pci_add_resource_offset(&resources, &vmd->resources[1], offset[0]); > > @@ -559,6 +560,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) > > if (!vmd->bus) { > > pci_free_resource_list(&resources); > > irq_domain_remove(vmd->irq_domain); > > + irq_domain_free_fwnode(fn); > > return -ENODEV; > > } > > > > @@ -672,6 +674,7 @@ static void vmd_cleanup_srcu(struct vmd_dev *vmd) > > static void vmd_remove(struct pci_dev *dev) > > { > > struct vmd_dev *vmd = pci_get_drvdata(dev); > > + struct fwnode_handle *fn = vmd->irq_domain->fwnode; > > > > sysfs_remove_link(&vmd->dev->dev.kobj, "domain"); > > pci_stop_root_bus(vmd->bus); > > @@ -679,6 +682,7 @@ static void vmd_remove(struct pci_dev *dev) > > vmd_cleanup_srcu(vmd); > > vmd_detach_resources(vmd); > > irq_domain_remove(vmd->irq_domain); > > + irq_domain_free_fwnode(fn); > > } > > > > #ifdef CONFIG_PM_SLEEP > > -- > > 2.18.1 > >