Ah, crap yeah that won't work since we don't free the ring. Key point is we need to distinct between the ring doesn't work temporary because we are in a GPU reset and it doesn't work at all because we are missing firmware or stuff like that. And no, checking the gpu_reset flag is totally racy and can't be done either. How about checking accel_working instead? Christian. Am 01.03.2018 um 07:01 schrieb Liu, Monk: >> Please change the test to use ring->ring_obj instead, this way we still bail out if somebody tries to submit commands before the ring is even allocated. > I don't understand how could fill_buffer() get run under the case that ring->ring_obj is not even allocated ... where is such case ? > > > /Monk > > -----Original Message----- > From: Koenig, Christian > Sent: 2018å¹´2æ??28æ?¥ 20:46 > To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org > Subject: Re: [PATCH 3/4] drm/amdgpu: don't return when ring not ready for fill_buffer > > Good point, but in this case we need some other handling. > > Please change the test to use ring->ring_obj instead, this way we still bail out if somebody tries to submit commands before the ring is even allocated. > > And you also need to fix a couple of other places in amdgpu_ttm.c. > > Regards, > Christian. > > Am 28.02.2018 um 13:34 schrieb Liu, Monk: >> Because when SDMA was hang by like process A, and meanwhile another >> process B is already running into the code of fill_buffer() So just let process B continue, don't block it otherwise process B would fail by software reason . >> >> Let it run and finally process B's job would fail and GPU recover will >> repeat it again (since it is a kernel job) >> >> Without this solution other process will be greatly harmed by one >> black sheep that triggering GPU recover >> >> /Monk >> >> >> >> -----Original Message----- >> From: Christian König [mailto:ckoenig.leichtzumerken at gmail.com] >> Sent: 2018å¹´2æ??28æ?¥ 20:24 >> To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org >> Subject: Re: [PATCH 3/4] drm/amdgpu: don't return when ring not ready >> for fill_buffer >> >> Am 28.02.2018 um 08:21 schrieb Monk Liu: >>> because this time SDMA may under GPU RESET so its ring->ready may not >>> true, keep going and GPU scheduler will reschedule this job if it >>> failed. >>> >>> give a warning on copy_buffer when go through direct_submit while >>> ring->ready is false >> NAK, that test has already saved us quite a bunch of trouble with the fb layer. >> >> Why exactly are you running into issues with that? >> >> Christian. >> >>> Change-Id: Ife6cd55e0e843d99900e5bed5418499e88633685 >>> Signed-off-by: Monk Liu <Monk.Liu at amd.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 +----- >>> 1 file changed, 1 insertion(+), 5 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>> index e38e6db..7b75ac9 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>> @@ -1656,6 +1656,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, >>> amdgpu_ring_pad_ib(ring, &job->ibs[0]); >>> WARN_ON(job->ibs[0].length_dw > num_dw); >>> if (direct_submit) { >>> + WARN_ON(!ring->ready); >>> r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs, >>> NULL, fence); >>> job->fence = dma_fence_get(*fence); @@ -1692,11 +1693,6 @@ int >>> amdgpu_fill_buffer(struct amdgpu_bo *bo, >>> struct amdgpu_job *job; >>> int r; >>> >>> - if (!ring->ready) { >>> - DRM_ERROR("Trying to clear memory with ring turned off.\n"); >>> - return -EINVAL; >>> - } >>> - >>> if (bo->tbo.mem.mem_type == TTM_PL_TT) { >>> r = amdgpu_ttm_alloc_gart(&bo->tbo); >>> if (r) > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx