[Public] > From: Jesse.zhang@xxxxxxx <jesse.zhang@xxxxxxx> > Sent: Wednesday, December 18, 2024 6:26 PM > Subject: [PATCH] drm/amdkfd: fixed page fault when enable MES shader debugger > > Initialize the process context address before setting the shader debugger. I think it would make sense to pull this into it's own function if it's duplicated at multiple places. Also, may need to add a check before amdgpu_mes_flush_shader_debugger as well? Seems like in that case it would be enough to just skip the call if gpu_addr is null. Regards, Teddy > [ 260.781212] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 > vmid:0 pasid:0) > [ 260.781236] amdgpu 0000:03:00.0: amdgpu: in page starting at address > 0x0000000000000000 from client 10 > [ 260.781255] amdgpu 0000:03:00.0: amdgpu: > GCVM_L2_PROTECTION_FAULT_STATUS:0x00040A40 > [ 260.781270] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC > (0x5) > [ 260.781284] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0 > [ 260.781296] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0 > [ 260.781308] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x4 > [ 260.781320] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0 > [ 260.781332] amdgpu 0000:03:00.0: amdgpu: RW: 0x1 > [ 260.782017] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 > vmid:0 pasid:0) > [ 260.782039] amdgpu 0000:03:00.0: amdgpu: in page starting at address > 0x0000000000000000 from client 10 > [ 260.782058] amdgpu 0000:03:00.0: amdgpu: > GCVM_L2_PROTECTION_FAULT_STATUS:0x00040A41 > [ 260.782073] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC > (0x5) > [ 260.782087] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1 > [ 260.782098] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0 > [ 260.782110] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x4 > [ 260.782122] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0 > [ 260.782137] amdgpu 0000:03:00.0: amdgpu: RW: 0x1 > [ 260.782155] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 > vmid:0 pasid:0) > [ 260.782166] amdgpu 0000:03:00.0: amdgpu: in page starting at address > 0x0000000000000000 from client 10 > > Signed-off-by: Jesse Zhang <jesse.zhang@xxxxxxx> > --- > drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c > b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c > index 312dfa84f29f..a8abc3091801 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c > @@ -350,10 +350,27 @@ int kfd_dbg_set_mes_debug_mode(struct > kfd_process_device *pdd, bool sq_trap_en) { > uint32_t spi_dbg_cntl = pdd->spi_dbg_override | pdd->spi_dbg_launch_mode; > uint32_t flags = pdd->process->dbg_flags; > + struct amdgpu_device *adev = pdd->dev->adev; > + int r; > > if (!kfd_dbg_is_per_vmid_supported(pdd->dev)) > return 0; > > + if (!pdd->proc_ctx_cpu_ptr) { > + r = amdgpu_amdkfd_alloc_gtt_mem(adev, > + AMDGPU_MES_PROC_CTX_SIZE, > + &pdd->proc_ctx_bo, > + &pdd->proc_ctx_gpu_addr, > + &pdd->proc_ctx_cpu_ptr, > + false); > + if (r) { > + dev_err(adev->dev, > + "failed to allocate process context bo\n"); > + return r; > + } > + memset(pdd->proc_ctx_cpu_ptr, 0, > AMDGPU_MES_PROC_CTX_SIZE); > + } > + > return amdgpu_mes_set_shader_debugger(pdd->dev->adev, pdd- > >proc_ctx_gpu_addr, spi_dbg_cntl, > pdd->watch_points, flags, sq_trap_en); } > -- > 2.25.1 >