On Tue, 13 Feb 2024 at 12:20, Robin Murphy <robin.murphy@xxxxxxx> wrote: > > On 2024-02-13 7:51 am, Dmitry Baryshkov wrote: > > On Sat, 10 Feb 2024 at 00:23, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > >> > >> On Fri, Feb 09, 2024 at 10:05:38PM +0200, Dmitry Baryshkov wrote: > >>> On Tue, 17 Oct 2023 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > >>>> Now that the BLOCKED and IDENTITY behaviors are managed with their own > >>>> domains change to the domain_alloc_paging() op. > >>>> > >>>> The check for using_legacy_binding is now redundant, > >>>> arm_smmu_def_domain_type() always returns IOMMU_DOMAIN_IDENTITY for this > >>>> mode, so the core code will never attempt to create a DMA domain in the > >>>> first place. > >>>> > >>>> Since commit a4fdd9762272 ("iommu: Use flush queue capability") the core > >>>> code only passes in IDENTITY/BLOCKED/UNMANAGED/DMA domain types. It will > >>>> not pass in IDENTITY or BLOCKED if the global statics exist, so the test > >>>> for DMA is also redundant now too. > >>>> > >>>> Call arm_smmu_init_domain_context() early if a dev is available. > >>>> > >>>> Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > >>>> --- > >>>> drivers/iommu/arm/arm-smmu/arm-smmu.c | 21 +++++++++++++++------ > >>>> 1 file changed, 15 insertions(+), 6 deletions(-) > >>> > >>> For some reason this patch breaks booting of the APQ8096 Dragonboard820c > >>> (qcom/apq8096-db820c.dts). Dispbling display subsystem (mdss) and venus > >>> devices makes the board boot in most of the cases. Most frequently the > >>> last parts of the log loog in a following way: > >> > >> It is surprising we tested this patch on some tegra systems with this > >> iommu and didn't hit anything.. > >> > >> The only real functional thing this changes is to move the domain > >> initialization up in time, potentially a lot in time in some > >> cases. That function does alot of things including touching HW so > >> possibly there is some surprising interaction with something else. > > > > I should not be debugging strange platforms at 1 a.m. I forgot that > > there was another patch to revert. So after reverting the MPM patch, > > I'm getting the following results: > > > >> > >> So, I would expect this to not WARN_ON and to make it work the same as > >> before the patch: > > > > No warnings, the platform now boots up to the point of actually > > bringing up the venus device: > > > > > > [ 11.906514] ath10k_pci 0000:01:00.0: qca6174 hw3.2 target > > 0x05030000 chip_id 0x00340aff sub 0000:0000 > > [ 11.907119] ath10k_pci 0000:01:00.0: kconfig debug 1 debugfs 0 > > tracing 0 dfs 0 testmode 0 > > [ 11.915881] ath10k_pci 0000:01:00.0: firmware ver > > WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp crc32 > > bf907c7c > > [ 11.979972] Console: switching to colour frame buffer device 320x90 > > [ 11.990756] ath10k_pci 0000:01:00.0: board_file api 2 bmi_id 0:1 > > crc32 d2863f91 > > [ 12.060834] msm_mdp 901000.display-controller: [drm] fb0: msmdrmfb > > frame buffer device > > [ 12.096203] qcom-pcie 608000.pcie: Phy link never came up > > [ 12.103785] qcom-pcie 608000.pcie: PCI host bridge to bus 0001:00 > > [ 12.103970] qcom-venus c00000.video-codec: Adding to iommu group 3 > > > > Format: Log Type - Time(microsec) - Message - Optional Info > > Log Type: B - Since Boot(Power On Reset), D - Delta, S - Statistic > > S - QC_IMAGE_VERSION_STRING=BOOT.XF.1.0-00301 > > S - IMAGE_VARIANT_STRING=M8996LAB > > S - OEM_IMAGE_VERSION_STRING=crm-ubuntu68 > > S - Boot Interface: UFS > > > >> > >> Then I'd ask you to remove the comment and do: > >> > >> @@ -878,7 +878,9 @@ static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) > >> if (dev) { > >> struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev); > >> > >> + WARN_ON(true); > >> if (arm_smmu_init_domain_context(smmu_domain, cfg->smmu, dev)) { > >> + printk("Allocation failure in arm_smmu_domain_alloc_paging()\n"); > >> kfree(smmu_domain); > >> return NULL; > >> } > >> > >> > >> And then we may get a clue from the backtraces it generates. I only > >> saw one iommu group reported in your log so I'd expect one trace? > > > > I added dev_info + mdelays() around the arm_smmu_init_domain_context() > > and I can see that it crashes within that function. > > Yeah, this is totally broken. We can't just call the unmodified > arm_smmu_init_domain_context() at domain allocation because half of what > it's doing belongs to the attach operation. We should not be allocating > context banks, IRQs, etc. for a not-yet-attached domain, and we > certainly shouldn't be touching hardware there outside of RPM. Should I send a revert? > > Thanks, > Robin. -- With best wishes Dmitry