On Mon, Oct 28, 2024 at 11:41 AM Lazar, Lijo <lijo.lazar@xxxxxxx> wrote: > > > > On 10/28/2024 8:11 PM, Alex Deucher wrote: > > Ping? > > > > On Fri, Oct 18, 2024 at 11:47 AM Alex Deucher <alexdeucher@xxxxxxxxx> wrote: > >> > >> Ping? > >> > >> On Tue, Oct 15, 2024 at 2:28 PM Alex Deucher <alexander.deucher@xxxxxxx> wrote: > >>> > >>> Add messages to make it clear when a per ring reset > >>> happens. This is helpful for debugging and aligns with > >>> other reset methods. > >>> > >>> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> > >>> --- > >>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 3 +++ > >>> 1 file changed, 3 insertions(+) > >>> > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > >>> index 102742f1faa2..2d60552a13ac 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > >>> @@ -137,6 +137,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) > >>> /* attempt a per ring reset */ > >>> if (amdgpu_gpu_recovery && > >>> ring->funcs->reset) { > >>> + dev_err(adev->dev, "Starting %s ring reset\n", s_job->sched->name); > > Is dev_err intentional or dev_info is good enough? Also, suggest to add > ring name to fail/pass messages. I was being consistent with the other messages from this function. They are all dev_err. Will add the ring name. Thanks, Alex > > Thanks, > Lijo > > >>> /* stop the scheduler, but don't mess with the > >>> * bad job yet because if ring reset fails > >>> * we'll fall back to full GPU reset. > >>> @@ -150,8 +151,10 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) > >>> amdgpu_fence_driver_force_completion(ring); > >>> if (amdgpu_ring_sched_ready(ring)) > >>> drm_sched_start(&ring->sched); > >>> + dev_err(adev->dev, "Ring reset success\n");>>> goto exit; > >>> } > >>> + dev_err(adev->dev, "Ring reset failure\n"); > >>> } > >>> > >>> if (amdgpu_device_should_recover_gpu(ring->adev)) { > >>> -- > >>> 2.46.2 > >>>