Re: "ring gfx timeout" with Vega 64 on mesa 19.0.0-rc2 and kernel 5.0.0-rc6 (GPU reset still not works)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/13/19 1:27 PM, Mikhail Gavrilov wrote:


On Wed, 13 Feb 2019 at 20:20, Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx> wrote:
>
> OK, just apply the following to your amdgpu_dm_do_flip function and see
> if GPU reset does proceed after you experience the hang.
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index d59bafc..586301f 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -4809,7 +4809,7 @@ static void amdgpu_dm_do_flip(struct
> drm_atomic_state *state,
>
>                          /* Wait for all fences on this FB */
> WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, false,
> - MAX_SCHEDULE_TIMEOUT) < 0);
> +                                       msecs_to_jiffies(5000)) < 0);
>
>                          amdgpu_bo_get_tiling_flags(abo, &tiling_flags);
>
> Andrey
>

With this change situation extremely improved. I am even could reboot computer after it without pushing power button for 5 seconds.

But:
1) I am still couldn't return to desktop GNOME shell (Not Alt -Tab nor "Win" button does not help) (maybe it was not supposed?)
Anyway currently after GPU resset possible work only in terminal.

I think that expected as you need to restart X after this.

Regarding the original VM_FAULT we can try to debug that a bit to - enable this from trace-cmd

sudo trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e "amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e "amdgpu:amdgpu_iv"

and when the hang happens

as root
cd /sys/kernel/debug/tracing && cat trace > event_dump

+ as usual would be nice to have the relevant wave dump and registers from UMR + dmesg.

Andrey

IMG_20190213_223903_DRO.jpg
On photo how looks monitor after GPU reset.


2) I got a lot of messages  "[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!" I suppose not everything went according to plan.

I attached new kernel logs.

--
Best Regards,
Mike Gavrilov.
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux