The following page fault was observed during the exit moment of the HIP test process. In this particular error case, the HIP test (./MemcpyPerformance -h) does not require the AQL queue. As a result, the process_context_addr was not assigned when the KFD process was released, ultimately leading to this page fault during the execution of kfd_process_dequeue_from_all_devices(). [345962.294891] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0) [345962.295333] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10 [345962.295775] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B33 [345962.296097] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5) [345962.296394] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1 [345962.296633] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x1 [345962.296876] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [345962.297135] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1 [345962.297377] amdgpu 0000:03:00.0: amdgpu: RW: 0x0 [345962.297682] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:169 vmid:0 pasid:0) Signed-off-by: Prike Liang <Prike.Liang@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c index cee38bb6cfaf..4d313144cc4b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c @@ -1062,6 +1062,11 @@ int amdgpu_mes_flush_shader_debugger(struct amdgpu_device *adev, return -EINVAL; } + if (!process_context_addr) { + dev_warn(adev->dev, "invalidated process context addr\n"); + return -EINVAL; + } + op_input.op = MES_MISC_OP_SET_SHADER_DEBUGGER; op_input.set_shader_debugger.process_context_addr = process_context_addr; op_input.set_shader_debugger.flags.process_ctx_flush = true; -- 2.34.1