Well a perfect example of why we shouldn't take a look at what userspace does, but rather what the hardware can do when we design the IOCTLs. I'm trying to raise awareness for this for quite a while now, but a lot of people seem to think when the UMD doesn't do something the kernel doesn't need to handle this situation. Thanks for that info, Christian. Am 29.08.2016 um 10:28 schrieb Liu, Monk: > But speaking with practice attitude: at least close source UMD OCL doesn't use CE at all, and I guess MESA either ... > > BR Monk > > -----Original Message----- > From: Christian König [mailto:deathsimple at vodafone.de] > Sent: Monday, August 29, 2016 4:25 PM > To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org > Subject: Re: [PATCH 2/4] drm/amdgpu:new method to sync ce&de > > Hui what? > > It feels like a hundred times I asked if the compute engine has a CE as well, but so far the answer was always No. That would explain a whole bunch of problems we had with the compute rings as well. > > In this case the patch is good as it is, please just rebase it. > > Christian. > > Am 29.08.2016 um 10:14 schrieb Liu, Monk: >> No, compute ring also can leverage constant engines, that's depend on >> OCL umd behavior I just make sure KMD do nothing wrong >> >> BR Monk >> >> -----Original Message----- >> From: Christian König [mailto:deathsimple at vodafone.de] >> Sent: Monday, August 29, 2016 4:06 PM >> To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org >> Subject: Re: [PATCH 2/4] drm/amdgpu:new method to sync ce&de >> >> Am 29.08.2016 um 04:55 schrieb Monk Liu: >>> CE & DE can have most up to 128dw as the gap between them so to sync >>> CE nad DE we don't need double SWITCH_BUFFERs any more, which is >>> urgly and harm performance, we only need insert 128NOP after VM flush >>> to prevent CE vm fault. >>> >>> Change-Id: Ibec954ce4c817ad7d3bce89c2bcb95b6c6bb5411 >>> Signed-off-by: Monk Liu <Monk.Liu at amd.com> >> Looks good to me, but only the GFX engines have a CE. So syncing on the compute engines is pretty much pointless. >> >> So I suggest that you move this into the "usepfp" if branch as well. >> >> With that fixed the patch is Reviewed-by: Christian König <christian.koenig at amd.com>. >> >> Regards, >> Christian. >> >>> --- >>> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 18 ++++++------------ >>> 1 file changed, 6 insertions(+), 12 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >>> index 26fced0..ce1e616 100755 >>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >>> @@ -6005,14 +6005,6 @@ static void gfx_v8_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring) >>> amdgpu_ring_write(ring, seq); >>> amdgpu_ring_write(ring, 0xffffffff); >>> amdgpu_ring_write(ring, 4); /* poll interval */ >>> - >>> - if (usepfp) { >>> - /* synce CE with ME to prevent CE fetch CEIB before context switch done */ >>> - amdgpu_ring_write(ring, PACKET3(PACKET3_SWITCH_BUFFER, 0)); >>> - amdgpu_ring_write(ring, 0); >>> - amdgpu_ring_write(ring, PACKET3(PACKET3_SWITCH_BUFFER, 0)); >>> - amdgpu_ring_write(ring, 0); >>> - } >>> } >>> >>> static void gfx_v8_0_ring_emit_vm_flush(struct amdgpu_ring *ring, >>> @@ >>> -6020,6 +6012,9 @@ static void gfx_v8_0_ring_emit_vm_flush(struct amdgpu_ring *ring, >>> { >>> int usepfp = (ring->type == AMDGPU_RING_TYPE_GFX); >>> >>> + /* GFX8 emits 128 dw nop to prevent DE do vm_flush before CE finish CEIB */ >>> + amdgpu_ring_insert_nop(ring, 128); >>> + >>> amdgpu_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 3)); >>> amdgpu_ring_write(ring, (WRITE_DATA_ENGINE_SEL(usepfp) | >>> WRITE_DATA_DST_SEL(0)) | >>> @@ -6059,11 +6054,10 @@ static void gfx_v8_0_ring_emit_vm_flush(struct amdgpu_ring *ring, >>> /* sync PFP to ME, otherwise we might get invalid PFP reads */ >>> amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0)); >>> amdgpu_ring_write(ring, 0x0); >>> - amdgpu_ring_write(ring, PACKET3(PACKET3_SWITCH_BUFFER, 0)); >>> - amdgpu_ring_write(ring, 0); >>> - amdgpu_ring_write(ring, PACKET3(PACKET3_SWITCH_BUFFER, 0)); >>> - amdgpu_ring_write(ring, 0); >>> } >>> + >>> + /* GFX8 emits 128 dw nop to prevent CE access VM before vm_flush finish */ >>> + amdgpu_ring_insert_nop(ring, 128); >>> } >>> >>> static u32 gfx_v8_0_ring_get_rptr_compute(struct amdgpu_ring >>> *ring) >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx