Am 10.05.2017 um 11:00 schrieb zhoucm1: > > > On 2017å¹´05æ??10æ?¥ 16:50, Christian König wrote: >> Am 10.05.2017 um 10:38 schrieb zhoucm1: >>> >>> >>> On 2017å¹´05æ??10æ?¥ 16:26, Christian König wrote: >>>> Am 10.05.2017 um 09:31 schrieb Chunming Zhou: >>>>> this is an improvement for previous patch, the sched_sync is to >>>>> store fence >>>>> that could be skipped as scheduled, when job is executed, we >>>>> didn't need >>>>> pipeline_sync if all fences in sched_sync are signalled, otherwise >>>>> insert >>>>> pipeline_sync still. >>>>> >>>>> Change-Id: I26d3a2794272ba94b25753d4bf367326d12f6939 >>>>> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com> >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 7 ++++++- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 5 ++++- >>>>> 3 files changed, 11 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>>>> index 787acd7..ef018bf 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>>>> @@ -1162,6 +1162,7 @@ struct amdgpu_job { >>>>> struct amdgpu_vm *vm; >>>>> struct amdgpu_ring *ring; >>>>> struct amdgpu_sync sync; >>>>> + struct amdgpu_sync sched_sync; >>>>> struct amdgpu_ib *ibs; >>>>> struct fence *fence; /* the hw fence */ >>>>> uint32_t preamble_status; >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >>>>> index 2c6624d..86ad507 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >>>>> @@ -121,6 +121,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring >>>>> *ring, unsigned num_ibs, >>>>> { >>>>> struct amdgpu_device *adev = ring->adev; >>>>> struct amdgpu_ib *ib = &ibs[0]; >>>>> + struct fence *tmp; >>>>> bool skip_preamble, need_ctx_switch; >>>>> unsigned patch_offset = ~0; >>>>> struct amdgpu_vm *vm; >>>>> @@ -167,8 +168,12 @@ int amdgpu_ib_schedule(struct amdgpu_ring >>>>> *ring, unsigned num_ibs, >>>>> return r; >>>>> } >>>>> - if (ring->funcs->emit_pipeline_sync && job && >>>>> job->need_pipeline_sync) >>>>> + if (ring->funcs->emit_pipeline_sync && job && >>>>> + (tmp = amdgpu_sync_get_fence(&job->sched_sync))) { >>>>> + job->need_pipeline_sync = true; >>>>> amdgpu_ring_emit_pipeline_sync(ring); >>>>> + fence_put(tmp); >>>>> + } >>>>> if (vm) { >>>>> amdgpu_ring_insert_nop(ring, extra_nop); /* prevent CE >>>>> go too fast than DE */ >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>>>> index cfa97ab..fa0c8b1 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>>>> @@ -60,6 +60,7 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, >>>>> unsigned num_ibs, >>>>> (*job)->need_pipeline_sync = false; >>>>> amdgpu_sync_create(&(*job)->sync); >>>>> + amdgpu_sync_create(&(*job)->sched_sync); >>>>> return 0; >>>>> } >>>>> @@ -98,6 +99,7 @@ static void amdgpu_job_free_cb(struct >>>>> amd_sched_job *s_job) >>>>> fence_put(job->fence); >>>>> amdgpu_sync_free(&job->sync); >>>>> + amdgpu_sync_free(&job->sched_sync); >>>>> kfree(job); >>>>> } >>>>> @@ -107,6 +109,7 @@ void amdgpu_job_free(struct amdgpu_job *job) >>>>> fence_put(job->fence); >>>>> amdgpu_sync_free(&job->sync); >>>>> + amdgpu_sync_free(&job->sched_sync); >>>>> kfree(job); >>>>> } >>>>> @@ -154,7 +157,7 @@ static struct fence >>>>> *amdgpu_job_dependency(struct amd_sched_job *sched_job) >>>>> } >>>>> if (amd_sched_dependency_optimized(fence, >>>>> sched_job->s_entity)) >>>>> - job->need_pipeline_sync = true; >>>>> + amdgpu_sync_fence(job->adev, &job->sched_sync, fence); >>>> >>>> This can result in an -ENOMEM >>> will handle it. >>>> and additional to that we only need to remember the last fence >>>> optimized like this, not all of them. >>>> >>>> So just keep the last one found here in job->sched_fence instead. >>> I guess this isn't enough. >>> The dependency is not in order when calling, so the last one is not >>> always the last scheduled fence. >>> And they could be sched fence not hw fence, although they are >>> handled by same hw ring, but the sched fence context isn't same. >>> so we still need sched_sync here, right? >> >> No, amdgpu_job_dependency is only called again when the returned >> fence is signaled (or scheduled on the same ring). > Let use give an example for it: > Assume job->sync has two fences(fenceA and fenceB) which could be > scheduled. fenceA is from entity1, fenceB is from entity2, but both > for gfx engine, but fenceA could be submitted to hw ring behind fenceB. > the order in job->sync list is: others---->fenceA---->fenceB--->others. > when calling amdgpu_job_dependency, fenceA will be checked first, and > then fenceB. > > If following your proposal, we only store fenceB, but fenceA is the > later. Which isn't expected. Ah! Indeed, I didn't realized that the dependent fence could have already been scheduled. Mhm, how are we going to handle the out of memory situation then? Sine we are inside a kernel thread we are not supposed to fail at this point. Regards, Christian. > > > Regards, > David Zhou >> >> So when this is called and you find that you need to wait for another >> fence the order is guaranteed. >> >> Regards, >> Christian. >> >>> >>> Regards, >>> David zhou >>>> >>>> Regards, >>>> Christian. >>>> >>>>> return fence; >>>>> } >>>> >>>> >>> >> >