On 2023-11-23 11:25, James Zhu wrote:
On 2023-11-22 17:35, Felix Kuehling wrote:On 2023-11-03 09:11, James Zhu wrote:Add queue remapping to force the waves in any running processes to complete a CWSR trap.Please add an explanation why this is needed.[JZ] Even though the profiling-enabled bits is turned off, the CWSR trap handlers for some kernels with this process may still in running stage, this willforce the waves in any running processes to complete a CWSR trap, and make sure pc sampling is completely stopped with this process. I will add it later.
It may be confusing to talk specifically about "CWSR trap handler". There is only one trap handler that is triggered by different events: CWSR, host trap, s_trap instructions, exceptions, etc. When a new trap triggers, it serializes with any currently running trap handler in that wavefront. So it seems that you're using CWSR as a way to ensure that any host trap has completed: CWSR will wait for previous traps to finish before trapping again for CWSR, the HWS firmware waits for CWSR completion and the driver waits for HWS to finish CWSR with a fence on a HIQ QUERY_STATUS packet. Is that correct?
Regards, Felix
[JZ] Just want to create a general function in case that used by others. I am fine to remove passing filter_param/grace_periodSigned-off-by: James Zhu <James.Zhu@xxxxxxx> ---drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++++++++++drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 5 +++++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 3 +++ 3 files changed, 19 insertions(+)diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.cindex c0e71543389a..a3f57be63f4f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c@@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct device_queue_manager *dqm)return debug_map_and_unlock(dqm); } +void remap_queue(struct device_queue_manager *dqm, + enum kfd_unmap_queues_filter filter, + uint32_t filter_param, + uint32_t grace_period)Not sure if you need the filter and grace period parameters in this function. What's the point of exposing that to callers who just want to trigger a CWSR? You could also change the function name to reflect the purpose of the function, rather than the implementation.Regards, Felix+{ + dqm_lock(dqm); + if (!dqm->dev->kfd->shared_resources.enable_mes) + execute_queues_cpsch(dqm, filter, filter_param, grace_period); + dqm_unlock(dqm); +} + #if defined(CONFIG_DEBUG_FS) static void seq_reg_dump(struct seq_file *m,diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.hindex cf7e182588f8..f8aae3747a36 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h@@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct device_queue_manager *dqm);int debug_map_and_unlock(struct device_queue_manager *dqm); int debug_refresh_runlist(struct device_queue_manager *dqm); +void remap_queue(struct device_queue_manager *dqm, + enum kfd_unmap_queues_filter filter, + uint32_t filter_param, + uint32_t grace_period); +static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd){ return (pdd->lds_base >> 16) & 0xFF;diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.cindex e8f0559b618e..66670cdb813a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -24,6 +24,7 @@ #include "kfd_priv.h" #include "amdgpu_amdkfd.h" #include "kfd_pc_sampling.h" +#include "kfd_device_queue_manager.h" struct supported_pc_sample_info { uint32_t ip_version;@@ -164,6 +165,8 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd, cancel_work_sync(&pdd->dev->pcs_data.hosttrap_entry.base.pc_sampling_work);kfd_process_set_trap_pc_sampling_flag(&pdd->qpd, pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false); + remap_queue(pdd->dev->dqm,+ KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, USE_DEFAULT_GRACE_PERIOD);mutex_lock(&pdd->dev->pcs_data.mutex); pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;