On Thu, Mar 14, 2024 at 2:10 AM Sharma, Shashank <shashank.sharma@xxxxxxx> wrote: > > > On 14/03/2024 06:58, Khatri, Sunil wrote: > > > > On 3/14/2024 2:06 AM, Alex Deucher wrote: > >> On Tue, Mar 12, 2024 at 8:42 AM Sunil Khatri <sunil.khatri@xxxxxxx> > >> wrote: > >>> Add firmware version information of each > >>> IP and each instance where applicable. > >>> > >> Is there a way we can share some common code with devcoredump, > >> debugfs, and the info IOCTL? All three places need to query this > >> information and the same logic is repeated in each case. > > > > Hello Alex, > > > > Yes you re absolutely right the same information is being retrieved > > again as done in debugfs. I can reorganize the code so same function > > could be used by debugfs and devcoredump but this is exactly what i > > tried to avoid here. I did try to use minimum functionality in > > devcoredump without shuffling a lot of code here and there. > > > > Also our devcoredump is implemented in amdgpu_reset.c and not all the > > information is available here and there we might have to include lot > > of header and cross functions in amdgpu_reset until we want a > > dedicated file for devcoredump. > > > I think Alex is suggesting to have one common backend to generate all > the core debug info, and then different wrapper functions which can pack > this raw info into the packets aligned with respective infra like > devcore/debugfs/info IOCTL, which seems like a good idea to me. Right, something like this. I'm trying to avoid having to touch several places every time we add a new firmware type or TA, etc. Maybe something like an array of valid firmwares for each device and then we can just walk the array and call a helper function to fetch the firmware versions and name strings for the requested type. Then each use case can just call the helpers to get what it needs. Alex > > If you think you need a new file for this backend it should be fine. > > > something like: > > amdgpu_debug_core.c:: > > struct amdgpu_core_debug_info { > > /* Superset of all the info you are collecting from HW */ > > }; > > - amdgpu_debug_generate_core_info > > { > > /* This function collects the core debug info from HW and saves in > amdgpu_core_debug_info, > > we can update this periodically regardless of a request */ > > } > > and then: > > devcore_info *amdgpu_debug_pack_for_devcore(core_debug_info) > > { > > /* convert core debug info into devcore aligned format/data */ > > } > > ioctl_info *amdgpu_debug_pack_for_info_ioctl(core_debug_info) > > { > > /* convert core debug info into info IOCTL aligned format/data */ > > } > > debugfs_info *amdgpu_debug_pack_for_debugfs(core_debug_info) > > { > > /* convert core debug info into debugfs aligned format/data */ > > } > > - Shashank > > > > > > > > > Info IOCTL does have a lot of information which also is in pipeline to > > be dumped but this if we want to reuse the functionality of IOCTL we > > need to reorganize a lot of code. > > > > If that is the need of the hour i could work on that. Please let me know. > > > > This is my suggestion if it makes sense: > > > > 1. If we want to reuse a lot of functionality then we need to > > modularize some of the functions further so they could be consumed > > directly by devcoredump. > > 2. We should also have a dedicated file for devcoredump.c/.h so its > > easy to include headers of needed functionality cleanly and easy to > > expand devcoredump. > > 3. based on the priority and importance of this task we can add > > information else some repetition is a real possibility. > > > >> > >> Alex > >> > >> > >>> Signed-off-by: Sunil Khatri <sunil.khatri@xxxxxxx> > >>> --- > >>> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 122 > >>> ++++++++++++++++++++++ > >>> 1 file changed, 122 insertions(+) > >>> > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c > >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c > >>> index 611fdb90a1fc..78ddc58aef67 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c > >>> @@ -168,6 +168,123 @@ void amdgpu_coredump(struct amdgpu_device > >>> *adev, bool vram_lost, > >>> { > >>> } > >>> #else > >>> +static void amdgpu_devcoredump_fw_info(struct amdgpu_device *adev, > >>> struct drm_printer *p) > >>> +{ > >>> + uint32_t version; > >>> + uint32_t feature; > >>> + uint8_t smu_program, smu_major, smu_minor, smu_debug; > >>> + > >>> + drm_printf(p, "VCE feature version: %u, fw version: 0x%08x\n", > >>> + adev->vce.fb_version, adev->vce.fw_version); > >>> + drm_printf(p, "UVD feature version: %u, fw version: 0x%08x\n", > >>> + 0, adev->uvd.fw_version); > >>> + drm_printf(p, "GMC feature version: %u, fw version: 0x%08x\n", > >>> + 0, adev->gmc.fw_version); > >>> + drm_printf(p, "ME feature version: %u, fw version: 0x%08x\n", > >>> + adev->gfx.me_feature_version, > >>> adev->gfx.me_fw_version); > >>> + drm_printf(p, "PFP feature version: %u, fw version: 0x%08x\n", > >>> + adev->gfx.pfp_feature_version, > >>> adev->gfx.pfp_fw_version); > >>> + drm_printf(p, "CE feature version: %u, fw version: 0x%08x\n", > >>> + adev->gfx.ce_feature_version, > >>> adev->gfx.ce_fw_version); > >>> + drm_printf(p, "RLC feature version: %u, fw version: 0x%08x\n", > >>> + adev->gfx.rlc_feature_version, > >>> adev->gfx.rlc_fw_version); > >>> + > >>> + drm_printf(p, "RLC SRLC feature version: %u, fw version: > >>> 0x%08x\n", > >>> + adev->gfx.rlc_srlc_feature_version, > >>> + adev->gfx.rlc_srlc_fw_version); > >>> + drm_printf(p, "RLC SRLG feature version: %u, fw version: > >>> 0x%08x\n", > >>> + adev->gfx.rlc_srlg_feature_version, > >>> + adev->gfx.rlc_srlg_fw_version); > >>> + drm_printf(p, "RLC SRLS feature version: %u, fw version: > >>> 0x%08x\n", > >>> + adev->gfx.rlc_srls_feature_version, > >>> + adev->gfx.rlc_srls_fw_version); > >>> + drm_printf(p, "RLCP feature version: %u, fw version: 0x%08x\n", > >>> + adev->gfx.rlcp_ucode_feature_version, > >>> + adev->gfx.rlcp_ucode_version); > >>> + drm_printf(p, "RLCV feature version: %u, fw version: 0x%08x\n", > >>> + adev->gfx.rlcv_ucode_feature_version, > >>> + adev->gfx.rlcv_ucode_version); > >>> + drm_printf(p, "MEC feature version: %u, fw version: 0x%08x\n", > >>> + adev->gfx.mec_feature_version, > >>> + adev->gfx.mec_fw_version); > >>> + > >>> + if (adev->gfx.mec2_fw) > >>> + drm_printf(p, > >>> + "MEC2 feature version: %u, fw version: > >>> 0x%08x\n", > >>> + adev->gfx.mec2_feature_version, > >>> + adev->gfx.mec2_fw_version); > >>> + > >>> + drm_printf(p, "IMU feature version: %u, fw version: 0x%08x\n", > >>> + 0, adev->gfx.imu_fw_version); > >>> + drm_printf(p, "PSP SOS feature version: %u, fw version: > >>> 0x%08x\n", > >>> + adev->psp.sos.feature_version, > >>> + adev->psp.sos.fw_version); > >>> + drm_printf(p, "PSP ASD feature version: %u, fw version: > >>> 0x%08x\n", > >>> + adev->psp.asd_context.bin_desc.feature_version, > >>> + adev->psp.asd_context.bin_desc.fw_version); > >>> + > >>> + drm_printf(p, "TA XGMI feature version: 0x%08x, fw version: > >>> 0x%08x\n", > >>> + adev->psp.xgmi_context.context.bin_desc.feature_version, > >>> + adev->psp.xgmi_context.context.bin_desc.fw_version); > >>> + drm_printf(p, "TA RAS feature version: 0x%08x, fw version: > >>> 0x%08x\n", > >>> + adev->psp.ras_context.context.bin_desc.feature_version, > >>> + adev->psp.ras_context.context.bin_desc.fw_version); > >>> + drm_printf(p, "TA HDCP feature version: 0x%08x, fw version: > >>> 0x%08x\n", > >>> + adev->psp.hdcp_context.context.bin_desc.feature_version, > >>> + adev->psp.hdcp_context.context.bin_desc.fw_version); > >>> + drm_printf(p, "TA DTM feature version: 0x%08x, fw version: > >>> 0x%08x\n", > >>> + adev->psp.dtm_context.context.bin_desc.feature_version, > >>> + adev->psp.dtm_context.context.bin_desc.fw_version); > >>> + drm_printf(p, "TA RAP feature version: 0x%08x, fw version: > >>> 0x%08x\n", > >>> + adev->psp.rap_context.context.bin_desc.feature_version, > >>> + adev->psp.rap_context.context.bin_desc.fw_version); > >>> + drm_printf(p, "TA SECURE DISPLAY feature version: 0x%08x, fw > >>> version: 0x%08x\n", > >>> + adev->psp.securedisplay_context.context.bin_desc.feature_version, > >>> + adev->psp.securedisplay_context.context.bin_desc.fw_version); > >>> + > >>> + /* SMC firmware */ > >>> + version = adev->pm.fw_version; > >>> + > >>> + smu_program = (version >> 24) & 0xff; > >>> + smu_major = (version >> 16) & 0xff; > >>> + smu_minor = (version >> 8) & 0xff; > >>> + smu_debug = (version >> 0) & 0xff; > >>> + drm_printf(p, "SMC feature version: %u, program: %d, fw > >>> version: 0x%08x (%d.%d.%d)\n", > >>> + 0, smu_program, version, smu_major, smu_minor, > >>> smu_debug); > >>> + > >>> + /* SDMA firmware */ > >>> + for (int i = 0; i < adev->sdma.num_instances; i++) { > >>> + drm_printf(p, "SDMA%d feature version: %u, firmware > >>> version: 0x%08x\n", > >>> + i, adev->sdma.instance[i].feature_version, > >>> + adev->sdma.instance[i].fw_version); > >>> + } > >>> + > >>> + drm_printf(p, "VCN feature version: %u, fw version: 0x%08x\n", > >>> + 0, adev->vcn.fw_version); > >>> + drm_printf(p, "DMCU feature version: %u, fw version: 0x%08x\n", > >>> + 0, adev->dm.dmcu_fw_version); > >>> + drm_printf(p, "DMCUB feature version: %u, fw version: > >>> 0x%08x\n", > >>> + 0, adev->dm.dmcub_fw_version); > >>> + drm_printf(p, "PSP TOC feature version: %u, fw version: > >>> 0x%08x\n", > >>> + adev->psp.toc.feature_version, > >>> adev->psp.toc.fw_version); > >>> + > >>> + version = adev->mes.kiq_version & AMDGPU_MES_VERSION_MASK; > >>> + feature = (adev->mes.kiq_version & > >>> AMDGPU_MES_FEAT_VERSION_MASK) > >>> + >> > >>> AMDGPU_MES_FEAT_VERSION_SHIFT; > >>> + drm_printf(p, "MES_KIQ feature version: %u, fw version: > >>> 0x%08x\n", > >>> + feature, version); > >>> + > >>> + version = adev->mes.sched_version & AMDGPU_MES_VERSION_MASK; > >>> + feature = (adev->mes.sched_version & > >>> AMDGPU_MES_FEAT_VERSION_MASK) > >>> + >> > >>> AMDGPU_MES_FEAT_VERSION_SHIFT; > >>> + drm_printf(p, "MES feature version: %u, fw version: 0x%08x\n", > >>> + feature, version); > >>> + > >>> + drm_printf(p, "VPE feature version: %u, fw version: 0x%08x\n", > >>> + adev->vpe.feature_version, adev->vpe.fw_version); > >>> + > >>> +} > >>> + > >>> static ssize_t > >>> amdgpu_devcoredump_read(char *buffer, loff_t offset, size_t count, > >>> void *data, size_t datalen) > >>> @@ -215,6 +332,11 @@ amdgpu_devcoredump_read(char *buffer, loff_t > >>> offset, size_t count, > >>> } > >>> } > >>> > >>> + if (coredump->adev) { > >>> + drm_printf(&p, "IP Firmwares\n"); > >>> + amdgpu_devcoredump_fw_info(coredump->adev, &p); > >>> + } > >>> + > >>> if (coredump->ring) { > >>> drm_printf(&p, "\nRing timed out details\n"); > >>> drm_printf(&p, "IP Type: %d Ring Name: %s\n", > >>> -- > >>> 2.34.1 > >>>