On Thu, 2022-09-22 at 11:33 -0300, Jason Gunthorpe wrote: > On Thu, Sep 22, 2022 at 11:52:37AM +0200, Niklas Schnelle wrote: > > Since commit fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev > > calls") we can end up with duplicates in the list of devices attached to > > a domain. This is inefficient and confusing since only one domain can > > actually be in control of the IOMMU translations for a device. Fix this > > by detaching the device from the previous domain, if any, on attach. > > Add a WARN_ON() in case we still have attached devices on freeing the > > domain. > > > > Fixes: fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev calls") > > Signed-off-by: Niklas Schnelle <schnelle@xxxxxxxxxxxxx> > > --- > > Changes since v1: > > - WARN_ON() non-empty list in s390_domain_free() > > - Drop the found flag and instead WARN_ON() if we're detaching > > from a domain that isn't the active domain for the device > > > > drivers/iommu/s390-iommu.c | 81 ++++++++++++++++++++++---------------- > > 1 file changed, 46 insertions(+), 35 deletions(-) > > > > diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c > > index c898bcbbce11..187d2c7ba9ff 100644 > > --- a/drivers/iommu/s390-iommu.c > > +++ b/drivers/iommu/s390-iommu.c > > @@ -78,19 +78,48 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type) > > static void s390_domain_free(struct iommu_domain *domain) > > { > > struct s390_domain *s390_domain = to_s390_domain(domain); > > + unsigned long flags; > > > > + spin_lock_irqsave(&s390_domain->list_lock, flags); > > + WARN_ON(!list_empty(&s390_domain->devices)); > > + spin_unlock_irqrestore(&s390_domain->list_lock, flags); > > Minor, but, this is about to free the memory holding the lock, we > don't need to take it to do the WARN_ON.. list_empty() is already > lockless safe. > > > static int __s390_iommu_detach_device(struct s390_domain *s390_domain, > > struct zpci_dev *zdev) > > { > > This doesn't return a failure code anymore, make it void > > > static int s390_iommu_attach_device(struct iommu_domain *domain, > > struct device *dev) > > { > > struct s390_domain *s390_domain = to_s390_domain(domain); > > struct zpci_dev *zdev = to_zpci_dev(dev); > > struct s390_domain_device *domain_device; > > + struct s390_domain *prev_domain = NULL; > > unsigned long flags; > > - int cc, rc; > > + int cc, rc = 0; > > > > if (!zdev) > > return -ENODEV; > > @@ -99,16 +128,15 @@ static int s390_iommu_attach_device(struct iommu_domain *domain, > > if (!domain_device) > > return -ENOMEM; > > > > - if (zdev->dma_table && !zdev->s390_domain) { > > - cc = zpci_dma_exit_device(zdev); > > - if (cc) { > > + if (zdev->s390_domain) { > > + prev_domain = zdev->s390_domain; > > + rc = __s390_iommu_detach_device(zdev->s390_domain, zdev); > > + } else if (zdev->dma_table) { > > + if (zpci_dma_exit_device(zdev)) > > rc = -EIO; > > - goto out_free; > > - } > > } > > - > > - if (zdev->s390_domain) > > - zpci_unregister_ioat(zdev, 0); > > + if (rc) > > + goto out_free; > > > > zdev->dma_table = s390_domain->dma_table; > > cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma, > > @@ -129,7 +157,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain, > > domain->geometry.aperture_end != zdev->end_dma) { > > rc = -EINVAL; > > spin_unlock_irqrestore(&s390_domain->list_lock, flags); > > - goto out_restore; > > + goto out_unregister_restore; > > } > > domain_device->zdev = zdev; > > zdev->s390_domain = s390_domain; > > @@ -138,14 +166,15 @@ static int s390_iommu_attach_device(struct iommu_domain *domain, > > > > return 0; > > > > +out_unregister_restore: > > + zpci_unregister_ioat(zdev, 0); > > out_restore: > > - if (!zdev->s390_domain) { > > + zdev->dma_table = NULL; > > + if (prev_domain) > > + s390_iommu_attach_device(&prev_domain->domain, > > + dev); > > Huh. That is a surprising thing > > I think this function needs some re-ordering to avoid this condition > > The checks for aperture should be earlier, and they are not quite > right. The aperture is only allowed to grow. If it starts out as 0 and > then is set to something valid on first attach, a later attach cannot > then shrink it. There could already be mappings in the domain under > the now invalidated aperture and no caller is prepared to deal with > this. > > That leaves the only error case as zpci_register_ioat() - which seems > like it is the actual "attach" operation. Since > __s390_iommu_detach_device() is just internal accounting (and can't > fail) it should be moved after I did miss a problem in my initial answer. While zpci_register_ioat() is indeed the actual "attach" operation, it assumes that at that point no DMA address translations are registered. In that state DMA is blocked of course. With that zpci_register_ioat() needs to come after the zpci_unregister_ioat() that is part of __s390_iommu_detach_device() and zpci_dma_exit_device(). If we do call those though we fundamentally need to restore the previous domain / DMA API state on any subsequent failure. If we don't restore we would leave the device detached from any domain with DMA blocked. I wonder if this could be an acceptable failure state though? It's safe as no DMA is possible and we could get out of it with a successful attach.