On Sat, 10 Feb 2024 at 00:23, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > On Fri, Feb 09, 2024 at 10:05:38PM +0200, Dmitry Baryshkov wrote: > > On Tue, 17 Oct 2023 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > Now that the BLOCKED and IDENTITY behaviors are managed with their own > > > domains change to the domain_alloc_paging() op. > > > > > > The check for using_legacy_binding is now redundant, > > > arm_smmu_def_domain_type() always returns IOMMU_DOMAIN_IDENTITY for this > > > mode, so the core code will never attempt to create a DMA domain in the > > > first place. > > > > > > Since commit a4fdd9762272 ("iommu: Use flush queue capability") the core > > > code only passes in IDENTITY/BLOCKED/UNMANAGED/DMA domain types. It will > > > not pass in IDENTITY or BLOCKED if the global statics exist, so the test > > > for DMA is also redundant now too. > > > > > > Call arm_smmu_init_domain_context() early if a dev is available. > > > > > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > --- > > > drivers/iommu/arm/arm-smmu/arm-smmu.c | 21 +++++++++++++++------ > > > 1 file changed, 15 insertions(+), 6 deletions(-) > > > > For some reason this patch breaks booting of the APQ8096 Dragonboard820c > > (qcom/apq8096-db820c.dts). Dispbling display subsystem (mdss) and venus > > devices makes the board boot in most of the cases. Most frequently the > > last parts of the log loog in a following way: > > It is surprising we tested this patch on some tegra systems with this > iommu and didn't hit anything.. > > The only real functional thing this changes is to move the domain > initialization up in time, potentially a lot in time in some > cases. That function does alot of things including touching HW so > possibly there is some surprising interaction with something else. > > So, I would expect this to not WARN_ON and to make it work the same as > before the patch: > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c > @@ -875,7 +875,9 @@ static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) > mutex_init(&smmu_domain->init_mutex); > spin_lock_init(&smmu_domain->cb_lock); > > - if (dev) { > + WARN_ON(using_legacy_binding); > + > +/* if (dev) { > struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev); > > if (arm_smmu_init_domain_context(smmu_domain, cfg->smmu, dev)) { > @@ -883,7 +885,7 @@ static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) > return NULL; > } > } > - > +*/ > return &smmu_domain->domain; > } With the full device tree, this crashed during the IOMMU probe, no warnings: [ 15.343477] bam-dma-engine 644000.dma-controller: num-channels unspecified in dt [ 15.343675] bam-dma-engine 644000.dma-controller: num-ees unspecified in dt [ 15.394653] msm_serial 7570000.serial: msm_serial: detected port #2 [ 15.395170] msm_serial 7570000.serial: uartclk = 19200000 [ 15.401983] 7570000.serial: ttyMSM2 at MMIO 0x7570000 (irq = 37, base_baud = 1200000) is a MSM [ 15.407549] serial serial0: tty port ttyMSM2 registered [ 15.421567] Bluetooth: hci0: setting up ROME/QCA6390 [ 15.421728] arm-smmu b40000.iommu: probing hardware configuration... [ 15.425866] arm-smmu b40000.iommu: SMMUv2 with: [ 15.432153] arm-smmu b40000.iommu: stage 1 translation [ 15.436407] arm-smmu b40000.iommu: address translation ops [ 15.441580] arm-smmu b40000.iommu: non-coherent table walk [ 15.447136] arm-smmu b40000.iommu: (IDR0.CTTW overridden by FW configuration) [ 15.452750] arm-smmu b40000.iommu: stream matching with 2 register groups [ 15.460038] arm-smmu b40000.iommu: 2 context banks (0 stage-2 only) [ 15.466908] arm-smmu b40000.iommu: Supported page sizes: 0x63315000 [ 15.473367] arm-smmu b40000.iommu: Stage-1: 48-bit VA -> 36-bit IPA [ 15.481638] arm-smmu b40000.iommu: preserved 0 boot mappings [ 15.491783] arm-smmu d00000.iommu: probing hardware configuration... [ 15.491955] arm-smmu d00000.iommu: SMMUv2 with: [ Format: Log Type - Time(microsec) - Message - Optional Info Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic S - QC_IMAGE_VERSION_STRING=BOOT.XF.1.0-00301 > > Then I'd ask you to remove the comment and do: > > @@ -878,7 +878,9 @@ static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) > if (dev) { > struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev); > > + WARN_ON(true); > if (arm_smmu_init_domain_context(smmu_domain, cfg->smmu, dev)) { > + printk("Allocation failure in arm_smmu_domain_alloc_paging()\n"); > kfree(smmu_domain); > return NULL; > } > > > And then we may get a clue from the backtraces it generates. I only > saw one iommu group reported in your log so I'd expect one trace? With the full device tree, same result: [ 12.319303] bam-dma-engine 644000.dma-controller: num-channels unspecified in dt [ 12.319502] bam-dma-engine 644000.dma-controller: num-ees unspecified in dt [ 12.370379] msm_serial 7570000.serial: msm_serial: detected port #2 [ 12.370895] msm_serial 7570000.serial: uartclk = 19200000 [ 12.377773] 7570000.serial: ttyMSM2 at MMIO 0x7570000 (irq = 37, base_baud = 1200000) is a MSM [ 12.383228] serial serial0: tty port ttyMSM2 registered [ 12.397263] arm-smmu b40000.iommu: probing hardware configuration... [ 12.397441] arm-smmu b40000.iommu: SMMUv2 with: [ 12.398072] Bluetooth: hci0: setting up ROME/QCA6390 [ 12.402779] arm-smmu b40000.iommu: stage 1 translation [ 12.402817] arm-smmu b40000.iommu: address translation ops [ 12.402832] arm-smmu b40000.iommu: non-coherent table walk [ 12.402848] arm-smmu b40000.iommu: (IDR0.CTTW overridden by FW configuration) [ 12.402881] arm-smmu b40000.iommu: stream matching with 2 register groups [ 12.402943] arm-smmu b40000.iommu: 2 context banks (0 stage-2 only) [ 12.402987] arm-smmu b40000.iommu: Supported page sizes: 0x63315000 [ 12.403004] arm-smmu b40000.iommu: Stage-1: 48-bit VA -> 36-bit IPA [ 12.404941] arm-smmu b40000.iommu: preserved 0 boot mappings [ 12.467015] arm-smmu d00000.iommu: probing hardware configuration... [ 12.467318] arm-smmu d0 Format: Log Type - Time(microsec) - Message - Optional Info Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic S - QC_IMAGE_VERSION_STRING=BOOT.XF.1.0-00301 Even with disabling display-subsystem and venus, it crashes at the same place, I don't seem to be able to get past it anymore. -- With best wishes Dmitry