08.04.2021 16:26, Thierry Reding пишет: > On Thu, Apr 08, 2021 at 02:42:42AM -0700, Nicolin Chen wrote: >> On Mon, Mar 29, 2021 at 02:32:55AM +0300, Dmitry Osipenko wrote: >>> All consumer-grade Android and Chromebook devices show a splash screen >>> on boot and then display is left enabled when kernel is booted. This >>> behaviour is unacceptable in a case of implicit IOMMU domains to which >>> devices are attached during kernel boot since devices, like display >>> controller, may perform DMA at that time. We can work around this problem >>> by deferring the enable of SMMU translation for a specific devices, >>> like a display controller, until the first IOMMU mapping is created, >>> which works good enough in practice because by that time h/w is already >>> stopped. >>> >>> Signed-off-by: Dmitry Osipenko <digetx@xxxxxxxxx> >> >> For both patches: >> Acked-by: Nicolin Chen <nicoleotsuka@xxxxxxxxx> >> Tested-by: Nicolin Chen <nicoleotsuka@xxxxxxxxx> >> >> The WAR looks good to me. Perhaps Thierry would give some input. Nicolin, thank you very much for the help! >> Another topic: >> I think this may help work around the mc-errors, which we have >> been facing on Tegra210 also when we enable IOMMU_DOMAIN_DMA. >> (attached a test patch rebasing on these two) > > Ugh... that's exactly what I was afraid of. Now everybody is going to > think that we can just work around this issue with driver-specific SMMU > hacks... > >> However, GPU would also report errors using DMA domain: >> >> nouveau 57000000.gpu: acr: firmware unavailable >> nouveau 57000000.gpu: pmu: firmware unavailable >> nouveau 57000000.gpu: gr: firmware unavailable >> tegra-mc 70019000.memory-controller: gpusrd: read @0x00000000fffbe200: Security violation (TrustZone violation) >> nouveau 57000000.gpu: DRM: failed to create kernel channel, -22 >> tegra-mc 70019000.memory-controller: gpusrd: read @0x00000000fffad000: Security violation (TrustZone violation) >> nouveau 57000000.gpu: fifo: SCHED_ERROR 20 [] >> nouveau 57000000.gpu: fifo: SCHED_ERROR 20 [] >> >> Looking at the address, seems that GPU allocated memory in 32-bit >> physical address space behind SMMU, so a violation happened after >> turning on DMA domain I guess... > > The problem with GPU is... extra complicated. You're getting these > faults because you're enabling the IOMMU-backed DMA API, which then > causes the Nouveau driver allocate buffers using the DMA API instead of > explicitly allocating pages and then mapping them using the IOMMU API. > However, there are additional patches needed to teach Nouveau about how > to deal with SMMU and those haven't been merged yet. I've got prototypes > of this, but before the whole framebuffer carveout passing work makes > progress there's little sense in moving individual pieces forward. > > One more not to try and cut corners. We know what the right solution is, > even if it takes a lot of work. I'm willing to ack this patch, or some > version of it, but only as a way of working around things we have no > realistic chance of fixing properly anymore. I still think it would be > best if we could derive identity mappings from command-line arguments on > these platforms because I think most of them will actually set that, and > then the solution becomes at least uniform at the SMMU level. > > For Tegra210 I've already laid out a path to a solution that's going to > be generic and extend to Tegra186 and later as well. We still have issues in the DRM and other drivers that don't allow us to flip ON the IOMMU_DOMAIN_DMA support. My patch addresses the issue with the ARM_DMA_USE_IOMMU option, which allocates the unmanaged domain for DMA purposes on ARM32, causing the trouble in the multiplatform kernel configuration since it's not possible to opt-out from ARM_DMA_USE_IOMMU in this case. Perhaps this needs to be clarified in the commit message. https://elixir.bootlin.com/linux/v5.12-rc6/source/arch/arm/mm/dma-mapping.c#L2078 https://elixir.bootlin.com/linux/v5.12-rc6/source/drivers/iommu/iommu.c#L1929