On 17/08/2023 17:38, André Almeida wrote:
Em 17/08/2023 12:26, Shashank Sharma escreveu:
On 17/08/2023 17:17, André Almeida wrote:
Em 17/08/2023 12:04, Shashank Sharma escreveu:
On 17/08/2023 15:45, André Almeida wrote:
Hi Shashank,
Em 17/08/2023 03:41, Shashank Sharma escreveu:
Hello Andre,
On 15/08/2023 21:50, André Almeida wrote:
Instead of storing coredump information inside amdgpu_device
struct,
move if to a proper separated struct and allocate it
dynamically. This
will make it easier to further expand the logged information.
Signed-off-by: André Almeida <andrealmeid@xxxxxxxxxx>
---
v4: change kmalloc to kzalloc
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 14 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 63
++++++++++++++--------
2 files changed, 49 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 9c6a332261ab..0d560b713948 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1088,11 +1088,6 @@ struct amdgpu_device {
uint32_t *reset_dump_reg_list;
uint32_t *reset_dump_reg_value;
int num_regs;
-#ifdef CONFIG_DEV_COREDUMP
- struct amdgpu_task_info reset_task_info;
- bool reset_vram_lost;
- struct timespec64 reset_time;
-#endif
bool scpm_enabled;
uint32_t scpm_status;
@@ -1105,6 +1100,15 @@ struct amdgpu_device {
uint32_t aid_mask;
};
+#ifdef CONFIG_DEV_COREDUMP
+struct amdgpu_coredump_info {
+ struct amdgpu_device *adev;
+ struct amdgpu_task_info reset_task_info;
+ struct timespec64 reset_time;
+ bool reset_vram_lost;
+};
The patch looks good to me in general, but I would recommend
slightly different arrangement and segregation of GPU reset
information.
Please consider a higher level structure adev->gpu_reset_info,
and move everything related to reset dump info into that,
including this new coredump_info structure, something like this:
struct amdgpu_reset_info {
uint32_t *reset_dump_reg_list;
uint32_t *reset_dump_reg_value;
int num_regs;
Right, I can encapsulate there reset_dump members,
#ifdef CONFIG_DEV_COREDUMP
struct amdgpu_coredump_info *coredump_info;/* keep this
dynamic allocation */
but we don't need a pointer for amdgpu_coredump_info inside
amdgpu_device or inside of amdgpu_device->gpu_reset_info, right?
I think it would be better if we keep all of the GPU reset related
data in the same structure, so adev->gpu_reset_info->coredump_info
sounds about right to me.
But after patch 2/4, we don't need to store a coredump_info pointer
inside adev, this is what I meant. What would be the purpose of
having this pointer? It's freed by amdgpu_devcoredump_free(), so we
don't need to keep track of it.
Well, actually we are pulling in some 0parallel efforts on enhancing
the GPU reset information, and we were planning to use the coredump
info for some additional things. So if I have the coredump_info
available (like reset_task_info and vram_lost) across a few functions
in the driver with adev, it would make my job easy there :).
It seems dangerous to use an object with this limited lifetime to rely
to read on. If you want to use it you will need to change
amdgpu_devcoredump_free() to drop a reference or you will need to use
it statically, which defeats the purpose of this patch. Anyway, I'll
add it as you requested.
I guess if the coredump_free function can make the
adev->reset_info->coredump_info= NULL, after freeing it, that will
actually help the case.
While consuming it, I can simply check if
(adev->reset_info->coredump_info) is available to be read.
- Shashank
- Shashank
- Shashank
#endif
}
This will make sure that all the relevant information is at the
same place.
- Shashank
amdgpu_inc_vram_lost(tmp_adev);