Hi Robin, On Tue, Dec 19, 2017 at 04:34:46PM +0000, Robin Murphy wrote: > Hi Tomasz, > > On 19/12/17 15:13, Tomasz Nowicki wrote: > >Here is my lspci output of ThunderX2 for which I am observing kernel panic coming from > >SMMUv3 driver -> arm_smmu_write_strtab_ent() -> BUG_ON(ste_live): > > > ># lspci -vt > >-[0000:00]-+-00.0-[01-1f]--+ [...] > > + [...] > > \-00.0-[1e-1f]----00.0-[1f]----00.0 ASPEED Technology, Inc. ASPEED Graphics Family > > > >ASP device -> 1f:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family > >PCI-Express to PCI/PCI-X Bridge -> 1e:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge > >While setting up ASP device SID in IORT dirver: > >iort_iommu_configure() -> pci_for_each_dma_alias() > >we need to walk up and iterate over each device which alias transaction from > >downstream devices. > > > >AST device (1f:00.0) gets BDF=0x1f00 and corresponding SID=0x1f00 from IORT. > >Bridge (1e:00.0) is the first alias. Following PCI Express to PCI/PCI-X Bridge > >spec: PCIe-to-PCI/X bridges alias transactions from downstream devices using > >the subordinate bus number. For bridge (1e:00.0), the subordinate is equal > >to 0x1f. This gives BDF=0x1f00 and SID=1f00 which is the same as downstream > >device. So it is possible to have two identical SIDs. The question is what we > >should do about such case. Presented patch prevents from registering the same > >ID so that SMMUv3 is not complaining later on. > > Ooh, subtle :( There is logic in arm_smmu_attach_device() to tolerate > grouped devices aliasing to the same ID, but I guess I overlooked the > distinction of a device sharing an alias ID with itself. I'm not sure > I really like trying to work around this in generic code, since > fwspec->ids is essentially opaque data in a driver-specific format - in > theory a driver is free to encode a single logical ID into multiple > fwspec elements (I think I did that in an early draft of SMMUv2 SMR > support), at which point this approach might corrupt things massively. > > Does the (untested) diff below suffice? > > Robin. > > ----->8-----diff --git a/drivers/iommu/arm-smmu-v3.c > b/drivers/iommu/arm-smmu-v3.c > index f122071688fd..d8a730d83401 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -1731,7 +1731,7 @@ static __le64 *arm_smmu_get_step_for_sid(struct > arm_smmu_device *smmu, u32 sid) > > static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec) > { > - int i; > + int i, j; > struct arm_smmu_master_data *master = fwspec->iommu_priv; > struct arm_smmu_device *smmu = master->smmu; > > @@ -1739,6 +1739,13 @@ static void arm_smmu_install_ste_for_dev(struct > iommu_fwspec *fwspec) > u32 sid = fwspec->ids[i]; > __le64 *step = arm_smmu_get_step_for_sid(smmu, sid); > > + /* Bridged PCI devices may end up with duplicated IDs */ > + for (j = 0; j < i; j++) > + if (fwspec->ids[j] == sid) > + break; > + if (j < i) > + continue; > + > arm_smmu_write_strtab_ent(smmu, sid, step, &master->ste); > } > } Here is another tested by if you need one more: Tested-by: Jayachandran C. <jnair@xxxxxxxxxxxxxxxxxx> This fixes the crash below seen in ThunderX2 boards with Aspeed BMC (when booted without iommu.passthrough). Since this is a regression that breaks bootup on the platform, can you consider submitting this for the 4.15 cycle? [ 84.729351] ------------[ cut here ]------------ [ 84.729354] kernel BUG at /home/ubuntu/linux/drivers/iommu/arm-smmu-v3.c:1201! [ 84.729358] Internal error: Oops - BUG: 0 [#1] SMP [ 84.729360] Modules linked in: ast(+) ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops qede(+) drm q ed bnx2x(+) mpt3sas raid_class scsi_transport_sas sdhci_acpi mdio sdhci libcrc32c gpio_xlp [ 84.729375] CPU: 190 PID: 1725 Comm: systemd-udevd Not tainted 4.15.0-rc5 #37 [ 84.729376] Hardware name: Cavium Inc. Unknown/Unknown, BIOS 1.0 11/08/2017 [ 84.729379] pstate: 80400009 (Nzcv daif +PAN -UAO) [ 84.729388] pc : arm_smmu_write_strtab_ent+0x1f0/0x1f8 [ 84.729391] lr : arm_smmu_install_ste_for_dev+0x70/0xc8 [ 84.729392] sp : ffff00001f6e3730 [ 84.729393] x29: ffff00001f6e3730 x28: ffff809ecc0d2268 [ 84.729395] x27: 0000000000000018 x26: ffff00001f6e3910 [ 84.729397] x25: ffff809ecc0d2238 x24: ffff809ecdc94158 [ 84.729398] x23: 0000000000000018 x22: 0000000000001d00 [ 84.729400] x21: ffff809ecdc90018 x20: ffff809ecb5ef288 [ 84.729401] x19: ffff80beca480000 x18: ffff0000093b8c08 [ 84.729402] x17: ffff000008aeab70 x16: ffff00001f6e3a20 [ 84.729404] x15: 0000000000000000 x14: dead000000000100 [ 84.729405] x13: dead000000000200 x12: 0000000000000020 [ 84.729407] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f [ 84.729408] x9 : 0000000000000000 x8 : 0000000000000000 [ 84.729410] x7 : 0000000000000100 x6 : 0000000000000015 [ 84.729411] x5 : 0000000000000015 x4 : 0000000000000002 [ 84.729413] x3 : ffff809ecb5ef288 x2 : 0000000000000001 [ 84.729414] x1 : ffff000008691e30 x0 : ffff809ecc0d2238 [ 84.729418] Process systemd-udevd (pid: 1725, stack limit = 0x000000002c585821) [ 84.729420] Call trace: [ 84.729423] arm_smmu_write_strtab_ent+0x1f0/0x1f8 [ 84.729425] arm_smmu_install_ste_for_dev+0x70/0xc8 [ 84.729426] arm_smmu_attach_dev+0x100/0x2f8 [ 84.729431] __iommu_attach_device+0x54/0xe0 [ 84.729433] iommu_group_add_device+0x150/0x428 [ 84.729435] iommu_group_get_for_dev+0x84/0x180 [ 84.729436] arm_smmu_add_device+0x138/0x240 [ 84.729445] iort_iommu_configure+0x138/0x188 [ 84.729452] acpi_dma_configure+0x3c/0x80 [ 84.729456] dma_configure+0xb0/0xe0 [ 84.729462] driver_probe_device+0x1f0/0x4a8 [ 84.729464] __driver_attach+0x124/0x128 [ 84.729466] bus_for_each_dev+0x70/0xb0 [ 84.729467] driver_attach+0x30/0x40 [ 84.729469] bus_add_driver+0x248/0x2b8 [ 84.729471] driver_register+0x68/0x100 [ 84.729478] __pci_register_driver+0x5c/0x70 [ 84.729488] ast_init+0x30/0x1000 [ast] [ 84.729494] do_one_initcall+0x5c/0x168 [ 84.729501] do_init_module+0x64/0x1f4 [ 84.729502] load_module+0x1e64/0x21e8 [ 84.729503] SyS_finit_module+0x108/0x118 [ 84.729505] el0_svc_naked+0x20/0x24 [ 84.729508] Code: 34ffff82 d4210000 d2800120 17ffffd8 (d4210000) Thanks, JC.