On Thu, Apr 08, 2021 at 02:42:42AM -0700, Nicolin Chen wrote: > On Mon, Mar 29, 2021 at 02:32:55AM +0300, Dmitry Osipenko wrote: > > All consumer-grade Android and Chromebook devices show a splash screen > > on boot and then display is left enabled when kernel is booted. This > > behaviour is unacceptable in a case of implicit IOMMU domains to which > > devices are attached during kernel boot since devices, like display > > controller, may perform DMA at that time. We can work around this problem > > by deferring the enable of SMMU translation for a specific devices, > > like a display controller, until the first IOMMU mapping is created, > > which works good enough in practice because by that time h/w is already > > stopped. > > > > Signed-off-by: Dmitry Osipenko <digetx@xxxxxxxxx> > > For both patches: > Acked-by: Nicolin Chen <nicoleotsuka@xxxxxxxxx> > Tested-by: Nicolin Chen <nicoleotsuka@xxxxxxxxx> > > The WAR looks good to me. Perhaps Thierry would give some input. > > Another topic: > I think this may help work around the mc-errors, which we have > been facing on Tegra210 also when we enable IOMMU_DOMAIN_DMA. > (attached a test patch rebasing on these two) Ugh... that's exactly what I was afraid of. Now everybody is going to think that we can just work around this issue with driver-specific SMMU hacks... > However, GPU would also report errors using DMA domain: > > nouveau 57000000.gpu: acr: firmware unavailable > nouveau 57000000.gpu: pmu: firmware unavailable > nouveau 57000000.gpu: gr: firmware unavailable > tegra-mc 70019000.memory-controller: gpusrd: read @0x00000000fffbe200: Security violation (TrustZone violation) > nouveau 57000000.gpu: DRM: failed to create kernel channel, -22 > tegra-mc 70019000.memory-controller: gpusrd: read @0x00000000fffad000: Security violation (TrustZone violation) > nouveau 57000000.gpu: fifo: SCHED_ERROR 20 [] > nouveau 57000000.gpu: fifo: SCHED_ERROR 20 [] > > Looking at the address, seems that GPU allocated memory in 32-bit > physical address space behind SMMU, so a violation happened after > turning on DMA domain I guess... The problem with GPU is... extra complicated. You're getting these faults because you're enabling the IOMMU-backed DMA API, which then causes the Nouveau driver allocate buffers using the DMA API instead of explicitly allocating pages and then mapping them using the IOMMU API. However, there are additional patches needed to teach Nouveau about how to deal with SMMU and those haven't been merged yet. I've got prototypes of this, but before the whole framebuffer carveout passing work makes progress there's little sense in moving individual pieces forward. One more not to try and cut corners. We know what the right solution is, even if it takes a lot of work. I'm willing to ack this patch, or some version of it, but only as a way of working around things we have no realistic chance of fixing properly anymore. I still think it would be best if we could derive identity mappings from command-line arguments on these platforms because I think most of them will actually set that, and then the solution becomes at least uniform at the SMMU level. For Tegra210 I've already laid out a path to a solution that's going to be generic and extend to Tegra186 and later as well. Thierry
Attachment:
signature.asc
Description: PGP signature