On Sat, 10 Feb 2024 at 00:23, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > On Fri, Feb 09, 2024 at 10:05:38PM +0200, Dmitry Baryshkov wrote: > > On Tue, 17 Oct 2023 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > Now that the BLOCKED and IDENTITY behaviors are managed with their own > > > domains change to the domain_alloc_paging() op. > > > > > > The check for using_legacy_binding is now redundant, > > > arm_smmu_def_domain_type() always returns IOMMU_DOMAIN_IDENTITY for this > > > mode, so the core code will never attempt to create a DMA domain in the > > > first place. > > > > > > Since commit a4fdd9762272 ("iommu: Use flush queue capability") the core > > > code only passes in IDENTITY/BLOCKED/UNMANAGED/DMA domain types. It will > > > not pass in IDENTITY or BLOCKED if the global statics exist, so the test > > > for DMA is also redundant now too. > > > > > > Call arm_smmu_init_domain_context() early if a dev is available. > > > > > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > --- > > > drivers/iommu/arm/arm-smmu/arm-smmu.c | 21 +++++++++++++++------ > > > 1 file changed, 15 insertions(+), 6 deletions(-) > > > > For some reason this patch breaks booting of the APQ8096 Dragonboard820c > > (qcom/apq8096-db820c.dts). Dispbling display subsystem (mdss) and venus > > devices makes the board boot in most of the cases. Most frequently the > > last parts of the log loog in a following way: > > It is surprising we tested this patch on some tegra systems with this > iommu and didn't hit anything.. > > The only real functional thing this changes is to move the domain > initialization up in time, potentially a lot in time in some > cases. That function does alot of things including touching HW so > possibly there is some surprising interaction with something else. I should not be debugging strange platforms at 1 a.m. I forgot that there was another patch to revert. So after reverting the MPM patch, I'm getting the following results: > > So, I would expect this to not WARN_ON and to make it work the same as > before the patch: No warnings, the platform now boots up to the point of actually bringing up the venus device: [ 11.906514] ath10k_pci 0000:01:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 0000:0000 [ 11.907119] ath10k_pci 0000:01:00.0: kconfig debug 1 debugfs 0 tracing 0 dfs 0 testmode 0 [ 11.915881] ath10k_pci 0000:01:00.0: firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp crc32 bf907c7c [ 11.979972] Console: switching to colour frame buffer device 320x90 [ 11.990756] ath10k_pci 0000:01:00.0: board_file api 2 bmi_id 0:1 crc32 d2863f91 [ 12.060834] msm_mdp 901000.display-controller: [drm] fb0: msmdrmfb frame buffer device [ 12.096203] qcom-pcie 608000.pcie: Phy link never came up [ 12.103785] qcom-pcie 608000.pcie: PCI host bridge to bus 0001:00 [ 12.103970] qcom-venus c00000.video-codec: Adding to iommu group 3 Format: Log Type - Time(microsec) - Message - Optional Info Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic S - QC_IMAGE_VERSION_STRING=BOOT.XF.1.0-00301 S - IMAGE_VARIANT_STRING=M8996LAB S - OEM_IMAGE_VERSION_STRING=crm-ubuntu68 S - Boot Interface: UFS > > Then I'd ask you to remove the comment and do: > > @@ -878,7 +878,9 @@ static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) > if (dev) { > struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev); > > + WARN_ON(true); > if (arm_smmu_init_domain_context(smmu_domain, cfg->smmu, dev)) { > + printk("Allocation failure in arm_smmu_domain_alloc_paging()\n"); > kfree(smmu_domain); > return NULL; > } > > > And then we may get a clue from the backtraces it generates. I only > saw one iommu group reported in your log so I'd expect one trace? I added dev_info + mdelays() around the arm_smmu_init_domain_context() and I can see that it crashes within that function. [ 29.819624] qcom-venus c00000.video-codec: Adding to iommu group 1 [ 29.833181] ------------[ cut here ]------------ [ 29.839198] WARNING: CPU: 1 PID: 35 at drivers/iommu/arm/arm-smmu/arm-smmu.c:883 arm_smmu_domain_alloc_paging+0x80/0x174 [ 29.843980] Modules linked in: [ 29.854824] CPU: 1 PID: 35 Comm: kworker/u18:0 Tainted: G U 6.8.0-rc3-next-20240208-05495-g20708c29957d-dirty #1739 [ 29.857694] Hardware name: Qualcomm Technologies, Inc. DB820c (DT) [ 29.869410] Workqueue: events_unbound deferred_probe_work_func [ 29.875658] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 29.881474] pc : arm_smmu_domain_alloc_paging+0x80/0x174 [ 29.888331] lr : arm_smmu_domain_alloc_paging+0x68/0x174 [ 29.893885] sp : ffff8000830338c0 [ 29.899179] x29: ffff8000830338c0 x28: 0000000000000000 x27: ffff800081e72000 [ 29.902396] x26: ffff00008034ee48 x25: ffff000080b24810 x24: 0000000000000000 [ 29.909513] x23: ffff800081e73000 x22: ffff000080b24810 x21: ffff800082e23258 [ 29.916633] x20: ffff00008389a700 x19: ffff00008034f600 x18: ffffffffffffffff [ 29.918788] usb 1-1: new high-speed USB device number 2 using xhci-hcd [ 29.923746] x17: 0000000c0000000b x16: 0000000900000008 x15: 0000000000000000 [ 29.923765] x14: 000000000000b0af x13: 0000000000000000 x12: 0000000000000166 [ 29.923783] x11: 0000000000000001 x10: 0000000000001410 x9 : 0000000000000000 [ 29.923801] x8 : ffff00008034f800 x7 : 0000000000000000 x6 : 0000000000000000 [ 29.923819] x5 : 0000000000000000 x4 : 0000000000000002 x3 : 0000000000000000 [ 29.923837] x2 : ffff800082e23290 x1 : dead4ead00000000 x0 : 0000000000000000 [ 29.923855] Call trace: [ 29.923861] arm_smmu_domain_alloc_paging+0x80/0x174 [ 29.923872] __iommu_domain_alloc+0xcc/0xf4 [ 29.923884] iommu_setup_default_domain+0x294/0x554 [ 29.938567] Bluetooth: hci0: Frame reassembly failed (-84) [ 29.944494] __iommu_probe_device+0x418/0x43c [ 29.944508] iommu_probe_device+0x3c/0x80 [ 29.944519] of_iommu_configure+0x124/0x1b4 [ 29.944529] of_dma_configure_id+0x170/0x2f4 [ 29.969874] mmc0: new ultra high speed SDR104 SDHC card at address 5048 [ 29.972966] platform_dma_configure+0xa8/0xb4 [ 29.972983] really_probe+0x70/0x2ac [ 29.972992] __driver_probe_device+0x78/0x12c [ 29.973001] driver_probe_device+0xd8/0x160 [ 29.973010] __device_attach_driver+0xb8/0x138 [ 29.973019] bus_for_each_drv+0x80/0xdc [ 29.973027] __device_attach+0x9c/0x188 [ 29.973037] device_initial_probe+0x14/0x20 [ 29.973046] bus_probe_device+0xac/0xb0 [ 29.973055] deferred_probe_work_func+0x8c/0xc8 [ 29.973064] process_one_work+0x210/0x5e4 [ 29.983596] mmcblk0: mmc0:5048 SD32G 28.8 GiB [ 29.987546] worker_thread+0x1bc/0x38c [ 29.987558] kthread+0x120/0x124 [ 29.987568] ret_from_fork+0x10/0x20 [ 29.987579] irq event stamp: 109977 [ 29.987584] hardirqs last enabled at (109977): [<ffff800080fbbc48>] _raw_spin_unlock_irqrestore+0x6c/0x70 [ 29.987600] hardirqs last disabled at (109976): [<ffff800080fbb0a8>] _raw_spin_lock_irqsave+0x84/0x88 [ 29.987610] softirqs last enabled at (109966): [<ffff800080090680>] __do_softirq+0x498/0x4e0 [ 29.987619] softirqs last disabled at (109961): [<ffff800080096184>] ____do_softirq+0x10/0x1c [ 30.006747] mmcblk0: p1 [ 30.010291] ---[ end trace 0000000000000000 ]--- [ 30.018630] remoteproc remoteproc1: remote processor 9300000.remoteproc is now up [ 30.024525] qcom-pcie 600000.pcie: iATU: unroll F, 32 ob, 8 ib, align 4K, limit 4G [ 30.044747] qcom,apr remoteproc1:smd-edge.apr_audio_svc.-1.-1: Adding APR/GPR dev: aprsvc:service:4:3 [ 30.046118] qcom-pcie 600000.pcie: Invalid eDMA IRQs found [ 30.051718] qcom,apr remoteproc1:smd-edge.apr_audio_svc.-1.-1: Adding APR/GPR dev: aprsvc:service:4:4 [ 30.066435] Bluetooth: hci0: QCA Downloading qca/nvm_00440302.bin [ 30.130736] hub 1-1:1.0: USB hub found [ 30.150390] qcom-pcie 600000.pcie: PCIe Gen.1 x1 link up [ 30.156394] hub 1-1:1.0: 4 ports detected [ 30.161837] qcom-pcie 600000.pcie: PCI host bridge to bus 0000:00 [ 30.189583] pci_bus 0000:00: root bus resource [bus 00-ff] [ 30.195652] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff] [ 30.201035] pci_bus 0000:00: root bus resource [mem 0x0c300000-0x0cffffff] [ 30.205424] Bluetooth: hci0: QCA setup on UART is completed [ 30.207262] pci 0000:00:00.0: [17cb:0104] type 01 class 0x060400 PCIe Root Port [ 30.214380] usb 2-1: new SuperSpeed USB device number 2 using xhci-hcd [ 30.219636] qcom-venus c00000.video-codec: Allocating domain [ 30.221503] pci 0000:00:00.0: BAR 0 [mem 0x00000000-0x00000fff] [ 30.221680] pci 0000:00:00.0: PCI bridge to [bus 01-ff] [ 30.221772] pci 0000:00:00.0: bridge window [io 0x0000-0x0fff] [ 30.221832] pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff] [ 30.221945] pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff 64bit pref] [ 30.222617] pci 0000:00:00.0: PME# supported from D0 D3hot [ 30.273673] hub 2-1:1.0: USB hub found [ 30.276567] hub 2-1:1.0: 4 ports detected Format: Log Type - Time(microsec) - Message - Optional Info Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic S - QC_IMAGE_VERSION_STRING=BOOT.XF.1.0-00301 S - IMAGE_VARIANT_STRING=M8996LAB S - OEM_IMAGE_VERSION_STRING=crm-ubuntu68 S - Boot Interface: UFS I traced this further, it crashes during arm_smmu_write_context_bank(). -- With best wishes Dmitry