Re: [PATCH v2] drm/amdgpu: always use legacy tlb flush on cyan_skilfish

Felix Kuehling <felix.kuehling@xxxxxxx> · Thu, 14 Sep 2023 09:59:31 -0400

On 2023-09-14 9:39, Christian König wrote:
Is a single legacy flush sufficient to emulate an heavyweight flush as 
well?

On previous generations we needed to issue at least two legacy flushes 
for this.
I assume you are referring to the Vega20 XGMI workaround. That is a very 
different issue. Because PTEs would be cached in L2, we had to always 
use a heavy-weight flush that would also flush the L2 cache as well, and 
follow that with another legacy flush to deal with race conditions where 
stale PTEs could be re-fetched from L2 before the L2 flush was complete.

A heavy-weight flush guarantees that there are no more possible memory 
accesses using the old PTEs. With physically addressed caches on GFXv9 
that includes a cache flush because the address translation happened 
before putting data into the cache. I think the address translation and 
cache architecture works differently on GFXv10. So maybe the cache-flush 
is not required here.

But even then a legacy flush probably allows for in-flight memory 
accesses with old physical addresses to complete after the TLB flush. So 
there is a small risk of memory corruption that was assumed to not be 
accessed by the GPU any more. Or when using IOMMU device isolation it 
would result in IOMMU faults if the DMA mappings are invalidated 
slightly too early.

Regards,
  Felix



And please don't push before getting an rb from Felix as well.

Regards,
Christian.


Am 14.09.23 um 11:23 schrieb Lang Yu:
cyan_skilfish has problems with other flush types.

v2: fix incorrect ternary conditional operator usage.(Yifan)

Signed-off-by: Lang Yu <Lang.Yu@xxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx> # v5.15+
---
  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 7 ++++++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index d3da13f4c80e..c6d11047169a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -236,7 +236,8 @@ static void gmc_v10_0_flush_vm_hub(struct 
amdgpu_device *adev, uint32_t vmid,
  {
      bool use_semaphore = gmc_v10_0_use_invalidate_semaphore(adev, 
vmhub);
      struct amdgpu_vmhub *hub = &adev->vmhub[vmhub];
-    u32 inv_req = hub->vmhub_funcs->get_invalidate_req(vmid, 
flush_type);
+    u32 inv_req = hub->vmhub_funcs->get_invalidate_req(vmid,
+              (adev->asic_type != CHIP_CYAN_SKILLFISH) ? flush_type 
: 0);
      u32 tmp;
      /* Use register 17 for GART */
      const unsigned int eng = 17;
@@ -331,6 +332,8 @@ static void gmc_v10_0_flush_gpu_tlb(struct 
amdgpu_device *adev, uint32_t vmid,
        int r;
  +    flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH) ? 
flush_type : 0;
+
      /* flush hdp cache */
      adev->hdp.funcs->flush_hdp(adev, NULL);
  @@ -426,6 +429,8 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct 
amdgpu_device *adev,
      struct amdgpu_ring *ring = &adev->gfx.kiq[0].ring;
      struct amdgpu_kiq *kiq = &adev->gfx.kiq[0];
  +    flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH) ? 
flush_type : 0;
+
      if (amdgpu_emu_mode == 0 && ring->sched.ready) {
          spin_lock(&adev->gfx.kiq[0].ring_lock);
          /* 2 dwords flush + 8 dwords fence */