On Fri, Oct 18, 2024 at 7:19 AM Christian König <christian.koenig@xxxxxxx> wrote: > > Am 18.10.24 um 11:33 schrieb Zhang, Jesse(Jie): > > [AMD Official Use Only - AMD Internal Distribution Only] > > > > Hi Christian, > > > > -----Original Message----- > > From: Koenig, Christian <Christian.Koenig@xxxxxxx> > > Sent: Friday, October 18, 2024 4:47 PM > > To: Zhang, Jesse(Jie) <Jesse.Zhang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > > Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx> > > Subject: Re: [PATCH] drm/amdgpu: add the command AMDGPU_INFO_QUEUE_RESET to query queue reset > > > > Am 18.10.24 um 10:19 schrieb Jesse.zhang@xxxxxxx: > >> Not all ASICs support the queue reset feature. > >> Therefore, userspace can query this feature via > >> AMDGPU_INFO_QUEUE_RESET before validating a queue reset. > > Why would UMDs need that information? > > > > To verify queue reset. > > Now the igt uses many asic filters to hard code if queue reset is ready. > > Alex suggested that we can get the information directly from the driver. > > Ah, ok got it. Mhm in general sounds like the approach is cleaner, but > the IOCTL interface is supposed to be used by the UMD and tested by the > IGT tests. > > The problem is now that it's documented that the justification for > having the IOCTLs is the UMD and not the IGT tests. > > Could we also do this as debugfs interface? > > > Another problem is that our driver has added code about queue reset. > > The reset function is complete (adev->gfx.gfx_ring[0].funcs->reset), > > but fw partially supports it. > > For example navi31, KCQ invalid opcode case can be recovered by queue reset, > > but KCQ invalid packet length cannot be recovered now. > > > > So for this case, I am not sure whether we can return true for this function. > > > > More information can be obtained from the link: > > https://confluence.amd.com/display/AMDGPU/Phase+1+-+Validation+of+Per+Queue+Reset+for+Kernel+Queue > > Oh, yeah good question. Alex should probably answer that. Maybe a better option would be a reset_mask sysfs file for each IP which we could have a bit for each reset type. E.g., gc_reset_mask sdma_reset_mask vcn_reset_mask jpeg_reset_mask vpe_reset_mask AMDGPU_RESET_TYPE_FULL (1 << 0) /* full adapter reset, mode1/mode2/BACO/etc. */ AMDGPU_RESET_TYPE_SOFT_RESET (1 << 1) /* IP level soft reset */ AMDGPU_RESET_TYPE_PER_QUEUE (1 << 2) /* per queue */ AMDGPU_RESET_TYPE_PER_PIPE (1 << 3) /* per pipe */ Then the IGT tests could query the mask and determine which tests to run. Alex > > Regards, > Christian. > > > > > Thanks > > Jesse > >> Signed-off-by: Jesse Zhang <jesse.zhang@xxxxxxx> > >> --- > >> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 27 +++++++++++++++++++++++++ > >> include/uapi/drm/amdgpu_drm.h | 2 ++ > >> 2 files changed, 29 insertions(+) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > >> index b53c35992152..87dee858fb4c 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > >> @@ -577,6 +577,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) > >> uint64_t ui64 = 0; > >> int i, found, ret; > >> int ui32_size = sizeof(ui32); > >> + bool queue_reset; > >> > >> if (!info->return_size || !info->return_pointer) > >> return -EINVAL; > >> @@ -1282,6 +1283,32 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) > >> return copy_to_user(out, &gpuvm_fault, > >> min((size_t)size, sizeof(gpuvm_fault))) ? -EFAULT : 0; > >> } > >> + case AMDGPU_INFO_QUEUE_RESET: { > >> + fpriv = (struct amdgpu_fpriv *)filp->driver_priv; > >> + type = amdgpu_ip_get_block_type(adev, info->query_hw_ip.type); > >> + ip_block = amdgpu_device_ip_get_ip_block(adev, type); > >> + > >> + if (!ip_block || !ip_block->status.valid) > >> + return -EINVAL; > >> + > >> + switch (info->query_hw_ip.type) { > >> + case AMDGPU_HW_IP_GFX: > >> + queue_reset = adev->gfx.gfx_ring[0].funcs->reset ? true : false; > > Please never ever use this coding style. > > > > If you want to convert anything into a boolean use the !! operator. > > > > Regards, > > Christian. > > > >> + break; > >> + case AMDGPU_HW_IP_COMPUTE: > >> + queue_reset = adev->gfx.compute_ring[0].funcs->reset ? true : false; > >> + break; > >> + case AMDGPU_HW_IP_DMA: > >> + queue_reset = adev->sdma.instance[0].ring.funcs->reset ? true : false; > >> + break; > >> + case AMDGPU_HW_IP_UVD_ENC: > >> + case AMDGPU_HW_IP_VCN_DEC: > >> + default: > >> + queue_reset = false; > >> + } > >> + > >> + return copy_to_user(out, &queue_reset, min(size, 4u)) ? -EFAULT : 0; > >> + } > >> default: > >> DRM_DEBUG_KMS("Invalid request %d\n", info->query); > >> return -EINVAL; > >> diff --git a/include/uapi/drm/amdgpu_drm.h > >> b/include/uapi/drm/amdgpu_drm.h index d9bff1c3b326..3b17d82fd1ee > >> 100644 > >> --- a/include/uapi/drm/amdgpu_drm.h > >> +++ b/include/uapi/drm/amdgpu_drm.h > >> @@ -1052,6 +1052,8 @@ struct drm_amdgpu_cs_chunk_cp_gfx_shadow { > >> #define AMDGPU_INFO_MAX_IBS 0x22 > >> /* query last page fault info */ > >> #define AMDGPU_INFO_GPUVM_FAULT 0x23 > >> +/* query queue reset */ > >> +#define AMDGPU_INFO_QUEUE_RESET 0x24 > >> > >> #define AMDGPU_INFO_MMR_SE_INDEX_SHIFT 0 > >> #define AMDGPU_INFO_MMR_SE_INDEX_MASK 0xff >