From: Thierry Reding <treding@xxxxxxxxxx> Hi, the GPU on Jetson TX2 (GP10B) does not work properly on all devices. Why exactly is not clear, but there are slight differences between the SKUs that were tested. It turns out that the biggest issue is that on some devices (e.g. the one that I have), pulsing the GPU reset twice as is done in the current code (once as part of the power-ungate operation and then again in the driver) causes the GPU to go into a bad state on some devices. Conditionally doing the reset in the driver only if it isn't already done by the power domain code fixes this issue. Another issue is that the clock may be running at a rate of 0 Hz. This is unlikely to happen because it internally actually can't run that slow, but explicitly setting the clock rate at probe time does seem to help in some cases. Patch three in this series unifies reading the WPR configuration by getting it from GPU register rather than reaching into the memory controller's register space. This is slightly better because it better separates the two drivers and doesn't require an update everytime the memory controller moves to another register aperture. Patch 4 ensures the L2 cache makes memory requests with the proper stream ID, which is required when the GPU is behind an IOMMU. Patch 5 changes the GP10B device initialization to use the correct copy engine. GP10B is a Pascal generation GPU and the way that engines are described changes how the copy engines are enumerated compared to earlier generations. Patches 6 through 9 allow Nouveau to work on Tegra GPUs if the DMA API is backed by an IOMMU. This is different from current assumptions because mappings for all buffers mapped through the DMA API will need to have the special IOMMU bit set in their page tables. Note that this technically makes it possible to support big pages on Tegra because from the GPU's point of view all memory is now contiguous. However, these patches only make sure that buffers are mapped properly and don't try to enable big pages. Also note that mapping through the IOMMU comes at a slight cost, so this may not always be desirable. However, with Tegra186 and later it's currently not possible (from a DMA API point of view) to map only a subset of buffers through the IOMMU, so any such optimization is deferred. Furthermore, the ARM SMMU driver currently enforces the use of the SMMU by default, so there not much of a choice at the moment. Finally patches 10 and 11 enable the GPU on Jetson TX2 and make it use the SMMU. I can pick up patches 10 and 11 into the Tegra tree once the other patches have been merged into Nouveau. Thierry Alexandre Courbot (1): arm64: tegra: Enable GPU on Jetson TX2 Thierry Reding (10): drm/nouveau: tegra: Avoid pulsing reset twice drm/nouveau: tegra: Set clock rate if not set drm/nouveau: secboot: Read WPR configuration from GPU registers drm/nouveau: gp10b: Add custom L2 cache implementation drm/nouveau: gp10b: Use correct copy engine drm/nouveau: gk20a: Set IOMMU bit for DMA API if appropriate drm/nouveau: gk20a: Implement custom MMU class drm/nouveau: tegra: Skip IOMMU initialization if already attached drm/nouveau: tegra: Fall back to 32-bit DMA mask without IOMMU arm64: tegra: Enable SMMU for GPU on Tegra186 .../boot/dts/nvidia/tegra186-p2771-0000.dts | 4 + arch/arm64/boot/dts/nvidia/tegra186.dtsi | 1 + .../gpu/drm/nouveau/include/nvkm/subdev/ltc.h | 1 + .../gpu/drm/nouveau/nvkm/engine/device/base.c | 4 +- .../drm/nouveau/nvkm/engine/device/tegra.c | 152 +++++++++++------- .../drm/nouveau/nvkm/subdev/instmem/gk20a.c | 35 ++-- .../gpu/drm/nouveau/nvkm/subdev/ltc/Kbuild | 1 + .../gpu/drm/nouveau/nvkm/subdev/ltc/gp10b.c | 69 ++++++++ .../gpu/drm/nouveau/nvkm/subdev/ltc/priv.h | 2 + .../gpu/drm/nouveau/nvkm/subdev/mmu/gk20a.c | 50 +++++- .../gpu/drm/nouveau/nvkm/subdev/mmu/gk20a.h | 44 +++++ .../gpu/drm/nouveau/nvkm/subdev/mmu/gm20b.c | 6 +- .../gpu/drm/nouveau/nvkm/subdev/mmu/gp10b.c | 4 +- drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h | 1 + .../drm/nouveau/nvkm/subdev/mmu/vmmgk20a.c | 22 ++- .../drm/nouveau/nvkm/subdev/mmu/vmmgm20b.c | 4 +- .../drm/nouveau/nvkm/subdev/mmu/vmmgp10b.c | 20 ++- .../drm/nouveau/nvkm/subdev/secboot/gm200.h | 2 +- .../drm/nouveau/nvkm/subdev/secboot/gm20b.c | 81 ++++++---- .../drm/nouveau/nvkm/subdev/secboot/gp10b.c | 4 +- 20 files changed, 394 insertions(+), 113 deletions(-) create mode 100644 drivers/gpu/drm/nouveau/nvkm/subdev/ltc/gp10b.c create mode 100644 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gk20a.h -- 2.23.0