Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2022-01-19 09:50, Russell, Kent wrote:
[AMD Official Use Only]

-----Original Message-----
From: Kuehling, Felix <Felix.Kuehling@xxxxxxx>
Sent: Tuesday, January 18, 2022 7:16 PM
To: Russell, Kent <Kent.Russell@xxxxxxx>; Huang, JinHuiEric
<JinHuiEric.Huang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus

Am 2022-01-18 um 7:08 p.m. schrieb Russell, Kent:
One question inline


*KENT RUSSELL***

Sr. Software Engineer | Linux Compute Kernel

1 Commerce Valley Drive East

Markham, ON L3T 7X6

*O*+(1) 289-695-2122**| Ext 72122


------------------------------------------------------------------------
*From:* amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> on behalf of
Felix Kuehling <felix.kuehling@xxxxxxx>
*Sent:* Tuesday, January 18, 2022 6:36 PM
*To:* Huang, JinHuiEric <JinHuiEric.Huang@xxxxxxx>;
amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
*Subject:* Re: [PATCH] drm/amdkfd: enable heavy-weight TLB flush on
Arcturus

Am 2022-01-18 um 5:45 p.m. schrieb Eric Huang:
SDMA FW fixes the hang issue for adding heavy-weight TLB
flush on Arcturus, so we can enable it.

Signed-off-by: Eric Huang <jinhuieric.huang@xxxxxxx>
Reviewed-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  6 ------
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 10 ++++++++--
  2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a64cbbd943ba..acb4fd973e60 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1892,12 +1892,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
                                true);
        ret = unreserve_bo_and_vms(&ctx, false, false);

-     /* Only apply no TLB flush on Aldebaran to
-      * workaround regressions on other Asics.
-      */
-     if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
-             *table_freed = true;
-
        goto out;

  out_unreserve:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index b570c0454ce9..485d4c52c7de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1596,6 +1596,12 @@ static int
kfd_ioctl_free_memory_of_gpu(struct file *filep,
        return ret;
  }

+static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) {
+     return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)
Do we need to add a check for sdma ver >=8 here?
What's the significance of version 8 for Aldebaran? This code was
working on Aldebaran without a version check before. Did we ever
publicly release an SDMA firmware older than version 8 that for Aldebaran?
We released v5 for Aldebaran SDMA in ROCm 4.5 . If I remember the ticket correctly, the same fix for Arcturus was required for Aldebaran and was part of SDMA v8. But Eric is obviously watching the ticket more closely than I, so I'll defer to him there.
Yes. Aldebaran has the same bug as Arcturus in SDMA, but the bug doesn't cause GPU hang on Aldebaran. As Felix said heavy-weight TLB flush have been working on Aldebaran since it was enabled, so we don't need to check the version for it.

Regards,
Eric

  Kent

Regards,
   Felix


||
+            (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
+             dev->adev->sdma.instance[0].fw_version >= 18);
+}
+
  static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
                                        struct kfd_process *p, void
*data)
  {
@@ -1692,7 +1698,7 @@ static int kfd_ioctl_map_memory_to_gpu(struct
file *filep,
        }

        /* Flush TLBs after waiting for the page table updates to
complete */
-     if (table_freed) {
+     if (table_freed || !kfd_flush_tlb_after_unmap(dev)) {
                for (i = 0; i < args->n_devices; i++) {
                        peer = kfd_device_by_id(devices_arr[i]);
                        if (WARN_ON_ONCE(!peer))
@@ -1806,7 +1812,7 @@ static int
kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
        }
        mutex_unlock(&p->mutex);

-     if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
+     if (kfd_flush_tlb_after_unmap(dev)) {
                err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
                                (struct kgd_mem *) mem, true);
                if (err) {




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux