On Tue, Feb 23, 2021 at 08:10:41AM +0300, Dmitry Osipenko wrote: > 23.02.2021 05:13, Nicolin Chen пишет: > > Hi Dmitry, > > > > On Sat, Feb 20, 2021 at 08:16:22AM +0300, Dmitry Osipenko wrote: > >> 19.02.2021 01:07, Nicolin Chen пишет: > >>> Commit 25938c73cd79 ("iommu/tegra-smmu: Rework tegra_smmu_probe_device()") > >>> removed certain hack in the tegra_smmu_probe() by relying on IOMMU core to > >>> of_xlate SMMU's SID per device, so as to get rid of tegra_smmu_find() and > >>> tegra_smmu_configure() that are typically done in the IOMMU core also. > >>> > >>> This approach works for both existing devices that have DT nodes and other > >>> devices (like PCI device) that don't exist in DT, on Tegra210 and Tegra3 > >>> upon testing. However, Page Fault errors are reported on tegra124-Nyan: > >>> > >>> tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40: > >>> EMEM address decode error (SMMU translation error [--S]) > >>> tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40: > >>> Page fault (SMMU translation error [--S]) > >>> > >>> After debugging, I found that the mentioned commit changed some function > >>> callback sequence of tegra-smmu's, resulting in enabling SMMU for display > >>> client before display driver gets initialized. I couldn't reproduce exact > >>> same issue on Tegra210 as Tegra124 (arm-32) differs at arch-level code. > >> > >> Hello Nicolin, > >> > >> Could you please explain in a more details what exactly makes the > >> difference for the callback sequence? > > > > Here is a log with 5.11.0-rc6: > > https://lava.collabora.co.uk/scheduler/job/3187849 > > [dump_stack was added in some tegra-smmu functions] > > > > And here is a corresponding log with reverting the original commit: > > https://lava.collabora.co.uk/scheduler/job/3187851 > > > > Here is a log with 5.11.0-rc7-next-20210210: > > https://lava.collabora.co.uk/scheduler/job/3210245 > > > > And here is a corresponding log with reverting the original commit: > > https://lava.collabora.co.uk/scheduler/job/3210596 > > > > Both failing logs show that mc errors started right after client DC > > got enabled by ->attach_dev() callback that in the passing logs was > > not called until Host1x driver init. And note that two failing logs > > show that ->attach_dev() could be called from two different sources, > > of_dma_configure_id() or arch_setup_dma_ops(). > > > > The reason why ->attach_dev() gets called is probably related to the > > following reasons (sorry, can't be 100% sure as I don't have Tegra124 > > or other 32bit Tegra board to test): > > 1) With the commit reverted, all clients are probed in "arch" stage, > > which is even prior to iommu core initialization -- including it > > setting default domain type. This probably messed up the type of > > allocating domains against the default domain type. Also internal > > group is somehow affected. So some condition check in iommu core > > failed and then it bypassed ->attach_dev callback in really_probe > > stage, until Host1x driver does attach_dev again. > > > > 2) 32bit ARM has arch_setup_dma_ops() does an additional set of iommu > > domain allocation + attach_dev(), after of_dma_configure_id() did > > once. This isn't reproducible for me on Tegra210. > > > > As debugging online isn't very efficient, and given that Thierry has > > been working on the linear mapping of framebuffer carveout, I choose > > to partially revert as a quick fix. > > The partially revert should be okay, but it's not clear to me what makes > difference for T124 since I don't see that problem on T30, which also > has active display at a boot time. Hmm..do you see ->attach_dev() is called from host1x_client_iommu_attach or from of_dma_configure_id/arch_setup_dma_ops?