On Fri, Feb 02, 2024 at 12:15:40PM -0400, Jason Gunthorpe wrote: > > Yes looks like a race of some sort. Adding a bit of debug also makes the > > issue go away so difficult to see what is happening. > > I'm wondering if it is racing with iommu driver probing? I looked but > didn't notice anything obviously wrong there that would cause this > though. Oh there is at least one racy thing here.. The of_xlate hackjob is out of order and is racy if you have multiple instances: struct tegra_smmu *tegra_smmu_probe(struct device *dev, const struct tegra_smmu_soc *soc, struct tegra_mc *mc) { /* * This is a bit of a hack. Ideally we'd want to simply return this * value. However iommu_device_register() will attempt to add * all devices to the IOMMU before we get that far. In order * not to rely on global variables to track the IOMMU instance, we * set it here so that it can be looked up from the .probe_device() * callback via the IOMMU device's .drvdata field. */ mc->smmu = smmu; ^^^^^^^^^^^^^ After this of_xlate and probe_device will succeed [...] err = iommu_device_sysfs_add(&smmu->iommu, dev, NULL, dev_name(dev)); err = iommu_device_register(&smmu->iommu, &tegra_smmu_ops, dev); ^^^^^^^^^ But the iommu instance is not fully initialized yet. So: static int tegra_smmu_of_xlate(struct device *dev, struct of_phandle_args *args) { struct platform_device *iommu_pdev = of_find_device_by_node(args->np); struct tegra_mc *mc = platform_get_drvdata(iommu_pdev); dev_iommu_priv_set(dev, mc->smmu); ^^^^^^^^^^^^ Gets the partially initialized iommu instance static struct iommu_device *tegra_smmu_probe_device(struct device *dev) { smmu = dev_iommu_priv_get(dev); if (!smmu) return ERR_PTR(-ENODEV); ^^^^^^^^^^^^^ Allows the driver to bind to a partially setup instance ie if you have multiple instances of tegra-smmu and you manage to do concurrent probe where the iommu instance is probing concurrently with the failing device_add flow then you can a situation like you have described where the sysfs is not fully setup. Is this making sense to you? Add a sleep after the mc->smmu store and confirm with printing? I think this is all an insane design. I fixed this race and removed all this hackery in my fwspec removal series. There the iommu instance only ever comes out of the locked list that iommu_device_register() populates and drivers have a safe and simple flow. Maybe just moving the store to mc->smmu later would improve it? I didn't look closely.. Jason