What about this for fill_buffer(): if (ring->ready != true && !adev->in_gpu_reset) { return -EINVAL; } /Monk -----Original Message----- From: amd-gfx [mailto:amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx] On Behalf Of Liu, Monk Sent: 2018å¹´2æ??28æ?¥ 20:35 To: Koenig, Christian <Christian.Koenig at amd.com>; amd-gfx at lists.freedesktop.org Subject: RE: [PATCH 3/4] drm/amdgpu: don't return when ring not ready for fill_buffer Because when SDMA was hang by like process A, and meanwhile another process B is already running into the code of fill_buffer() So just let process B continue, don't block it otherwise process B would fail by software reason . Let it run and finally process B's job would fail and GPU recover will repeat it again (since it is a kernel job) Without this solution other process will be greatly harmed by one black sheep that triggering GPU recover /Monk -----Original Message----- From: Christian König [mailto:ckoenig.leichtzumerken@xxxxxxxxx] Sent: 2018å¹´2æ??28æ?¥ 20:24 To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org Subject: Re: [PATCH 3/4] drm/amdgpu: don't return when ring not ready for fill_buffer Am 28.02.2018 um 08:21 schrieb Monk Liu: > because this time SDMA may under GPU RESET so its ring->ready may not > true, keep going and GPU scheduler will reschedule this job if it > failed. > > give a warning on copy_buffer when go through direct_submit while > ring->ready is false NAK, that test has already saved us quite a bunch of trouble with the fb layer. Why exactly are you running into issues with that? Christian. > > Change-Id: Ife6cd55e0e843d99900e5bed5418499e88633685 > Signed-off-by: Monk Liu <Monk.Liu at amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index e38e6db..7b75ac9 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -1656,6 +1656,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > amdgpu_ring_pad_ib(ring, &job->ibs[0]); > WARN_ON(job->ibs[0].length_dw > num_dw); > if (direct_submit) { > + WARN_ON(!ring->ready); > r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs, > NULL, fence); > job->fence = dma_fence_get(*fence); @@ -1692,11 +1693,6 @@ int > amdgpu_fill_buffer(struct amdgpu_bo *bo, > struct amdgpu_job *job; > int r; > > - if (!ring->ready) { > - DRM_ERROR("Trying to clear memory with ring turned off.\n"); > - return -EINVAL; > - } > - > if (bo->tbo.mem.mem_type == TTM_PL_TT) { > r = amdgpu_ttm_alloc_gart(&bo->tbo); > if (r) _______________________________________________ amd-gfx mailing list amd-gfx at lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx