On Tue, Feb 13, 2024 at 01:18:24AM +0200, Dmitry Baryshkov wrote: > > > For some reason this patch breaks booting of the APQ8096 Dragonboard820c > > > (qcom/apq8096-db820c.dts). Dispbling display subsystem (mdss) and venus > > > devices makes the board boot in most of the cases. Most frequently the > > > last parts of the log loog in a following way: > > > > It is surprising we tested this patch on some tegra systems with this > > iommu and didn't hit anything.. > > > > The only real functional thing this changes is to move the domain > > initialization up in time, potentially a lot in time in some > > cases. That function does alot of things including touching HW so > > possibly there is some surprising interaction with something else. > > > > So, I would expect this to not WARN_ON and to make it work the same as > > before the patch: > > > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c > > @@ -875,7 +875,9 @@ static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) > > mutex_init(&smmu_domain->init_mutex); > > spin_lock_init(&smmu_domain->cb_lock); > > > > - if (dev) { > > + WARN_ON(using_legacy_binding); > > + > > +/* if (dev) { > > struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev); > > > > if (arm_smmu_init_domain_context(smmu_domain, cfg->smmu, dev)) { > > @@ -883,7 +885,7 @@ static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) > > return NULL; > > } > > } > > - > > +*/ > > return &smmu_domain->domain; > > } > > With the full device tree, this crashed during the IOMMU probe, no warnings: The above reverts nearly all the functional elements of the patch you said caused the problem, are you certain of your bisection? > > And then we may get a clue from the backtraces it generates. I only > > saw one iommu group reported in your log so I'd expect one trace? > > With the full device tree, same result: This adds basically an unconditional WARN_ON on all the probe paths, and nothing printed? That is even more surprising. Those two together suggest that arm_smmu_domain_alloc_paging() isn't even being called. But the caller is: if (alloc_type == IOMMU_DOMAIN_IDENTITY && ops->identity_domain) return ops->identity_domain; else if (alloc_type == IOMMU_DOMAIN_BLOCKED && ops->blocked_domain) return ops->blocked_domain; else if (type & __IOMMU_DOMAIN_PAGING && ops->domain_alloc_paging) domain = ops->domain_alloc_paging(dev); else if (ops->domain_alloc) domain = ops->domain_alloc(alloc_type); else return ERR_PTR(-EOPNOTSUPP); Which, suggest that alloc_type is somehow garbage for your system? I don't see how that can happen, but this patch will also cause a garbage type to be rejected by the core code. Does this reveal anything for you? @@ -2118,6 +2118,7 @@ static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops, struct iommu_domain *domain; unsigned int alloc_type = type & IOMMU_DOMAIN_ALLOC_FLAGS; + WARN(true, " __iommu_domain_alloc %u %u", alloc_type, type); if (alloc_type == IOMMU_DOMAIN_IDENTITY && ops->identity_domain) return ops->identity_domain; else if (alloc_type == IOMMU_DOMAIN_BLOCKED && ops->blocked_domain) @@ -2126,8 +2127,10 @@ static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops, domain = ops->domain_alloc_paging(dev); else if (ops->domain_alloc) domain = ops->domain_alloc(alloc_type); - else + else { + printk("Returning failure from __iommu_domain_alloc()\n"); return ERR_PTR(-EOPNOTSUPP); + } /* * Many domain_alloc ops now return ERR_PTR, make things easier for the It must print, something is wrong with the debugging process if this doesn't generate back traces :\ Jason