Re: [PATCH] iommu/tegra-smmu: Fix mc errors on tegra124-nyan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



23.02.2021 05:13, Nicolin Chen пишет:
> Hi Dmitry,
> 
> On Sat, Feb 20, 2021 at 08:16:22AM +0300, Dmitry Osipenko wrote:
>> 19.02.2021 01:07, Nicolin Chen пишет:
>>> Commit 25938c73cd79 ("iommu/tegra-smmu: Rework tegra_smmu_probe_device()")
>>> removed certain hack in the tegra_smmu_probe() by relying on IOMMU core to
>>> of_xlate SMMU's SID per device, so as to get rid of tegra_smmu_find() and
>>> tegra_smmu_configure() that are typically done in the IOMMU core also.
>>>
>>> This approach works for both existing devices that have DT nodes and other
>>> devices (like PCI device) that don't exist in DT, on Tegra210 and Tegra3
>>> upon testing. However, Page Fault errors are reported on tegra124-Nyan:
>>>
>>>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>>> 	 EMEM address decode error (SMMU translation error [--S])
>>>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>>> 	 Page fault (SMMU translation error [--S])
>>>
>>> After debugging, I found that the mentioned commit changed some function
>>> callback sequence of tegra-smmu's, resulting in enabling SMMU for display
>>> client before display driver gets initialized. I couldn't reproduce exact
>>> same issue on Tegra210 as Tegra124 (arm-32) differs at arch-level code.
>>
>> Hello Nicolin,
>>
>> Could you please explain in a more details what exactly makes the
>> difference for the callback sequence?
> 
> Here is a log with 5.11.0-rc6:
> https://lava.collabora.co.uk/scheduler/job/3187849
> [dump_stack was added in some tegra-smmu functions]
> 
> And here is a corresponding log with reverting the original commit:
> https://lava.collabora.co.uk/scheduler/job/3187851
> 
> Here is a log with 5.11.0-rc7-next-20210210:
> https://lava.collabora.co.uk/scheduler/job/3210245
> 
> And here is a corresponding log with reverting the original commit:
> https://lava.collabora.co.uk/scheduler/job/3210596
> 
> Both failing logs show that mc errors started right after client DC
> got enabled by ->attach_dev() callback that in the passing logs was
> not called until Host1x driver init. And note that two failing logs
> show that ->attach_dev() could be called from two different sources,
> of_dma_configure_id() or arch_setup_dma_ops().
> 
> The reason why ->attach_dev() gets called is probably related to the
> following reasons (sorry, can't be 100% sure as I don't have Tegra124
> or other 32bit Tegra board to test):
> 1) With the commit reverted, all clients are probed in "arch" stage,
>    which is even prior to iommu core initialization -- including it
>    setting default domain type. This probably messed up the type of
>    allocating domains against the default domain type. Also internal
>    group is somehow affected. So some condition check in iommu core
>    failed and then it bypassed ->attach_dev callback in really_probe
>    stage, until Host1x driver does attach_dev again.
> 
> 2) 32bit ARM has arch_setup_dma_ops() does an additional set of iommu
>    domain allocation + attach_dev(), after of_dma_configure_id() did
>    once. This isn't reproducible for me on Tegra210.
> 
> As debugging online isn't very efficient, and given that Thierry has
> been working on the linear mapping of framebuffer carveout, I choose
> to partially revert as a quick fix.

The partially revert should be okay, but it's not clear to me what makes
difference for T124 since I don't see that problem on T30, which also
has active display at a boot time.




[Index of Archives]     [ARM Kernel]     [Linux ARM]     [Linux ARM MSM]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux