On Sun, Nov 19, 2023 at 06:19:43PM +0900, Hector Martin wrote: > >> +static int iommu_fwspec_assign_iommu(struct iommu_fwspec *fwspec, > >> + struct device *dev, > >> + struct fwnode_handle *iommu_fwnode) > >> +{ > >> + const struct iommu_ops *ops; > >> + > >> + if (fwspec->iommu_fwnode) { > >> + /* > >> + * fwspec->iommu_fwnode is the first iommu's fwnode. In the rare > >> + * case of multiple iommus for one device they must point to the > >> + * same driver, checked via same ops. > >> + */ > >> + ops = iommu_ops_from_fwnode(iommu_fwnode); > > > > This carries over a related bug from the original code: If a device has > > two IOMMUs and the first one probes but the second one defers, ops will > > be NULL here and the check will fail with EINVAL. > > > > Adding a check for that case here fixes it: > > > > if (!ops) > > return driver_deferred_probe_check_state(dev); Yes! > > With that, for the whole series: > > > > Tested-by: Hector Martin <marcan@xxxxxxxxx> > > > > I can't specifically test for the probe races the series intends to fix > > though, since that bug we only hit extremely rarely. I'm just testing > > that nothing breaks. > > Actually no, this fix is not sufficient. If the first IOMMU is ready > then the xlate path allocates dev->iommu, which then > __iommu_probe_device takes as a sign that all IOMMUs are ready and does > the device init. It doesn't.. The code there is: if (!fwspec && dev->iommu) fwspec = dev->iommu->fwspec; if (fwspec) ops = fwspec->ops; else ops = dev->bus->iommu_ops; if (!ops) { ret = -ENODEV; goto out_unlock; } Which is sensitive only to !NULL fwspec, and if EPROBE_DEFER is returned fwspec will be freed and dev->iommu->fwspec will be NULL here. In the NULL case it does a 'bus probe' with a NULL fwspec and all the fwspec drivers return immediately from their probe functions. Did I miss something? > Then when the xlate comes along again after suceeding > with the second IOMMU, __iommu_probe_device sees the device is already > in a group and never initializes the second IOMMU, leaving the device > with only one IOMMU. This should be fixed by the first hunk to check every iommu and fail? BTW, do you have a systems with same device attached to multiple iommus? I've noticed another bug here, many drivers don't actually support differing iommu instances and nothing seems to check it.. Thanks, Jason