Re: [PATCH 21/24] drm/amdkfd: add queue remapping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2023-11-23 14:02, Felix Kuehling wrote:
On 2023-11-23 11:25, James Zhu wrote:

On 2023-11-22 17:35, Felix Kuehling wrote:

On 2023-11-03 09:11, James Zhu wrote:
Add queue remapping to force the waves in any running
processes to complete a CWSR trap.

Please add an explanation why this is needed.

[JZ] Even though the profiling-enabled bits is turned off, the CWSR trap handlers for some kernels with this process may still in running stage, this will

force the waves in any running processes to complete a CWSR trap, and make sure pc sampling is completely stopped with this process.   I will add it later.

It may be confusing to talk specifically about "CWSR trap handler". There is only one trap handler that is triggered by different events: CWSR, host trap, s_trap instructions, exceptions, etc. When a new trap triggers, it serializes with any currently running trap handler in that wavefront. So it seems that you're using CWSR as a way to ensure that any host trap has completed: CWSR will wait for previous traps to finish before trapping again for CWSR, the HWS firmware waits for CWSR completion and the driver waits for HWS to finish CWSR with a fence on a HIQ QUERY_STATUS packet. Is that correct?
[JZ] I think your explanation is more detail. Need Joseph to confirm.

Regards,
  Felix






Signed-off-by: James Zhu <James.Zhu@xxxxxxx>
---
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++++++++++
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |  5 +++++
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c          |  3 +++
  3 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c0e71543389a..a3f57be63f4f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct device_queue_manager *dqm)
      return debug_map_and_unlock(dqm);
  }
  +void remap_queue(struct device_queue_manager *dqm,
+                enum kfd_unmap_queues_filter filter,
+                uint32_t filter_param,
+                uint32_t grace_period)

Not sure if you need the filter and grace period parameters in this function. What's the point of exposing that to callers who just want to trigger a CWSR? You could also change the function name to reflect the purpose of the function, rather than the implementation.
[JZ] Just want to create a general function in case that used by others. I am fine to remove passing filter_param/grace_period

Regards,
  Felix


+{
+    dqm_lock(dqm);
+    if (!dqm->dev->kfd->shared_resources.enable_mes)
+        execute_queues_cpsch(dqm, filter, filter_param, grace_period);
+    dqm_unlock(dqm);
+}
+
  #if defined(CONFIG_DEBUG_FS)
    static void seq_reg_dump(struct seq_file *m,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index cf7e182588f8..f8aae3747a36 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct device_queue_manager *dqm);
  int debug_map_and_unlock(struct device_queue_manager *dqm);
  int debug_refresh_runlist(struct device_queue_manager *dqm);
  +void remap_queue(struct device_queue_manager *dqm,
+                enum kfd_unmap_queues_filter filter,
+                uint32_t filter_param,
+                uint32_t grace_period);
+
  static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd)
  {
      return (pdd->lds_base >> 16) & 0xFF;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index e8f0559b618e..66670cdb813a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -24,6 +24,7 @@
  #include "kfd_priv.h"
  #include "amdgpu_amdkfd.h"
  #include "kfd_pc_sampling.h"
+#include "kfd_device_queue_manager.h"
    struct supported_pc_sample_info {
      uint32_t ip_version;
@@ -164,6 +165,8 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd, cancel_work_sync(&pdd->dev->pcs_data.hosttrap_entry.base.pc_sampling_work);
kfd_process_set_trap_pc_sampling_flag(&pdd->qpd,
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
+        remap_queue(pdd->dev->dqm,
+            KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, USE_DEFAULT_GRACE_PERIOD);
            mutex_lock(&pdd->dev->pcs_data.mutex);
pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux