On 2021-12-23 11:06, asheplyakov@xxxxxxxxxx wrote:
From: Alexey Sheplyakov <asheplyakov@xxxxxxxxxx>
T62x/T60x GPUs are known to not work with panfrost as of now.
One of the reasons is wrong/incomplete memory attributes which
the panfrost driver sets in the page tables:
- MEMATTR_IMP_DEF should be 0x48ULL, not 0x88ULL.
0x88ULL is MEMATTR_OUTER_IMP_DEF
I guess the macro could be renamed if anyone's particularly bothered,
but using the outer-cacheable attribute is deliberate because it is
necessary for I/O-coherent GPUs to work properly (and should be
irrelevant for non-coherent integrations). I'm away from my Juno board
for the next couple of weeks but I'm fairly confident that this change
as-is will break cache snooping.
- MEMATTR_FORCE_CACHE_ALL and MEMATTR_OUTER_WA are missing.
They're "missing" because they're not used, and there's currently no
mechanism by which they *could* be used. Also note that the indices in
MEMATTR have to line up with the euqivalent LPAE indices for the closest
match to what the IOMMU API's IOMMU_{CACHE,MMIO} flags represent, so
moving those around is yet more subtle breakage.
T72x and newer GPUs work just fine with such incomplete/wrong memory
attributes. However T62x are quite picky and quickly lock up.
To avoid the problem set the same memory attributes (in LPAE page
tables) as mali_kbase.
The patch has been tested (for regressions) with T860 GPU (rk3399 SoC).
At the first glance (using GNOME desktop, running glmark) it does
not cause any changes for this GPU.
Note: this patch is necessary, but *not* enough to get panfrost
working with T62x
I'd note that panfrost has been working OK - to the extent that Mesa
supports its older ISA - on the T624 (single core group) in Arm's Juno
SoC for over a year now since commit 268af50f38b1.
If you have to force outer non-cacheable to avoid getting translation
faults and other errors that look like the GPU is inexplicably seeing
the wrong data, I'd check whether you have the same thing where your
integration is actually I/O-coherent and you're missing the
"dma-coherent" property in your DT.
Thanks,
Robin.
Signed-off-by: Alexey Sheplyakov <asheplyakov@xxxxxxxxxx>
Signed-off-by: Vadim V. Vlasov <vadim.vlasov@xxxxxxxxxxx>
Cc: Rob Herring <robh@xxxxxxxxxx>
Cc: Tomeu Vizoso <tomeu.vizoso@xxxxxxxxxxxxx>
Cc: Steven Price <steven.price@xxxxxxx>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@xxxxxxxxxxxxx>
Cc: Vadim V. Vlasov <vadim.vlasov@xxxxxxxxxxx>
---
drivers/gpu/drm/panfrost/panfrost_mmu.c | 3 ---
drivers/iommu/io-pgtable-arm.c | 16 ++++++++++++----
2 files changed, 12 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 39562f2d11a4..2f4f8a17bc82 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -133,9 +133,6 @@ static void panfrost_mmu_enable(struct panfrost_device *pfdev, struct panfrost_m
mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), lower_32_bits(transtab));
mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), upper_32_bits(transtab));
- /* Need to revisit mem attrs.
- * NC is the default, Mali driver is inner WT.
- */
mmu_write(pfdev, AS_MEMATTR_LO(as_nr), lower_32_bits(memattr));
mmu_write(pfdev, AS_MEMATTR_HI(as_nr), upper_32_bits(memattr));
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index dd9e47189d0d..15b39c337e20 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -122,13 +122,17 @@
#define ARM_LPAE_MAIR_ATTR_IDX_CACHE 1
#define ARM_LPAE_MAIR_ATTR_IDX_DEV 2
#define ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE 3
+#define ARM_LPAE_MAIR_ATTR_IDX_OUTER_WA 4
#define ARM_MALI_LPAE_TTBR_ADRMODE_TABLE (3u << 0)
#define ARM_MALI_LPAE_TTBR_READ_INNER BIT(2)
#define ARM_MALI_LPAE_TTBR_SHARE_OUTER BIT(4)
-#define ARM_MALI_LPAE_MEMATTR_IMP_DEF 0x88ULL
-#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x8DULL
+#define ARM_MALI_LPAE_MEMATTR_IMP_DEF 0x48ULL
+#define ARM_MALI_LPAE_MEMATTR_FORCE_CACHE_ALL 0x4FULL
+#define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x4DULL
+#define ARM_MALI_LPAE_MEMATTR_OUTER_IMP_DEF 0x88ULL
+#define ARM_MALI_LPAE_MEMATTR_OUTER_WA 0x8DULL
#define APPLE_DART_PTE_PROT_NO_WRITE (1<<7)
#define APPLE_DART_PTE_PROT_NO_READ (1<<8)
@@ -1080,10 +1084,14 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
cfg->arm_mali_lpae_cfg.memattr =
(ARM_MALI_LPAE_MEMATTR_IMP_DEF
<< ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_NC)) |
+ (ARM_MALI_LPAE_MEMATTR_FORCE_CACHE_ALL
+ << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
(ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC
<< ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_CACHE)) |
- (ARM_MALI_LPAE_MEMATTR_IMP_DEF
- << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
+ (ARM_MALI_LPAE_MEMATTR_OUTER_IMP_DEF
+ << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)) |
+ (ARM_MALI_LPAE_MEMATTR_OUTER_WA
+ << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_OUTER_WA));
data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL,
cfg);