On Mon, Nov 13, 2017 at 6:24 AM, Julien Isorce <julien.isorce at gmail.com> wrote: > Hi Alex, > > Thx for your reply, but in all of the cases you mentioned, the user would > still > be able to reboot properly ( i.e. typing reboot or a magic keyboard key) > or to have a trace of a kernel panic if it happens, is it correct ? Yes, the deadlock in the GPU scheduler was the issue preventing that from working properly. Alex > > Thx > Julien > > On 9 November 2017 at 18:08, Alex Deucher <alexdeucher at gmail.com> wrote: >> >> On Thu, Nov 9, 2017 at 4:35 AM, Julien Isorce <julien.isorce at gmail.com> >> wrote: >> > Hi Monk. >> > >> > I am interested on this. Currently when a "ring X stalled for more than >> > N >> > sec" happens it usually goes into the gpu reset routine. >> > Does it always cause the vram to be lost ? Could you explain what >> > happens if >> > the vram remains lost ? >> >> It means the contents of vram are gone or unreliable. In that case >> applications need to re-initialize all of their buffers before >> submitting any work. You really need to add GL_robustness support to >> any applications you care about. Whether vram is lost or not depends >> on the reset method and the asic. E.g., soft reset of a specific >> engine won't cause a loss of vram, but a full adapter reset or an FLR >> may. >> >> > >> > I am asking this because I experienced some recurrent gpu reset that are >> > marked succeeded from the log but fail in the "resume" step. >> > I would not be interested in this if it would always leave a chance to >> > the >> > user to cleanly reboot the machine. >> > >> > The issue is that it can require a hard reboot without kernel panic and >> > without keeping the keyboard responding to magic keys. >> > Are those patches trying to address this issue ? >> > >> > Note that here "issue" is not referring to the root cause of a ring X >> > stalled and it is also not referring to why "resume" step fails. >> >> There were a few issues that caused problems with GPU reset. The >> biggest was that the GPU scheduler deadlocked in certain cases so if >> you got a GPU hang, the driver locked up. That should mostly be >> straightened out at this point. I think there may still be some >> deadlocks in the modesetting code after a reset. Once that is sorted, >> it will come down to fine tuning the actual reset sequences. Full >> adapter resets are the easiest to get working reliably (and are >> already implemented in the driver), but also the most destructive. >> >> Alex >> >> > >> > Thx a lot >> > Julien >> > >> > >> > On 30 October 2017 at 04:15, Monk Liu <Monk.Liu at amd.com> wrote: >> >> >> >> *** job skipping logic in scheduler part is re-implemented *** >> >> >> >> Monk Liu (7): >> >> amd/scheduler:imple job skip feature(v3) >> >> drm/amdgpu:implement new GPU recover(v3) >> >> drm/amdgpu:cleanup in_sriov_reset and lock_reset >> >> drm/amdgpu:cleanup ucode_init_bo >> >> drm/amdgpu:block kms open during gpu_reset >> >> drm/amdgpu/sriov:fix memory leak in psp_load_fw >> >> drm/amdgpu:fix random missing of FLR NOTIFY >> >> >> >> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 9 +- >> >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 311 >> >> ++++++++++++-------------- >> >> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 10 +- >> >> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +- >> >> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 18 +- >> >> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 + >> >> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 22 +- >> >> drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +- >> >> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 - >> >> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 2 - >> >> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 +- >> >> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 +- >> >> drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 16 +- >> >> drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 2 +- >> >> drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 39 ++-- >> >> 15 files changed, 220 insertions(+), 232 deletions(-) >> >> >> >> -- >> >> 2.7.4 >> >> >> >> _______________________________________________ >> >> amd-gfx mailing list >> >> amd-gfx at lists.freedesktop.org >> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> > >> > >> > >> > _______________________________________________ >> > amd-gfx mailing list >> > amd-gfx at lists.freedesktop.org >> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> > > >