On 2017å¹´05æ??10æ?¥ 17:21, Christian König wrote: > Am 10.05.2017 um 11:00 schrieb zhoucm1: >> >> >> On 2017å¹´05æ??10æ?¥ 16:50, Christian König wrote: >>> Am 10.05.2017 um 10:38 schrieb zhoucm1: >>>> >>>> >>>> On 2017å¹´05æ??10æ?¥ 16:26, Christian König wrote: >>>>> Am 10.05.2017 um 09:31 schrieb Chunming Zhou: >>>>>> this is an improvement for previous patch, the sched_sync is to >>>>>> store fence >>>>>> that could be skipped as scheduled, when job is executed, we >>>>>> didn't need >>>>>> pipeline_sync if all fences in sched_sync are signalled, >>>>>> otherwise insert >>>>>> pipeline_sync still. >>>>>> >>>>>> Change-Id: I26d3a2794272ba94b25753d4bf367326d12f6939 >>>>>> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com> >>>>>> --- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 7 ++++++- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 5 ++++- >>>>>> 3 files changed, 11 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>>>>> index 787acd7..ef018bf 100644 >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>>>>> @@ -1162,6 +1162,7 @@ struct amdgpu_job { >>>>>> struct amdgpu_vm *vm; >>>>>> struct amdgpu_ring *ring; >>>>>> struct amdgpu_sync sync; >>>>>> + struct amdgpu_sync sched_sync; >>>>>> struct amdgpu_ib *ibs; >>>>>> struct fence *fence; /* the hw fence */ >>>>>> uint32_t preamble_status; >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >>>>>> index 2c6624d..86ad507 100644 >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >>>>>> @@ -121,6 +121,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring >>>>>> *ring, unsigned num_ibs, >>>>>> { >>>>>> struct amdgpu_device *adev = ring->adev; >>>>>> struct amdgpu_ib *ib = &ibs[0]; >>>>>> + struct fence *tmp; >>>>>> bool skip_preamble, need_ctx_switch; >>>>>> unsigned patch_offset = ~0; >>>>>> struct amdgpu_vm *vm; >>>>>> @@ -167,8 +168,12 @@ int amdgpu_ib_schedule(struct amdgpu_ring >>>>>> *ring, unsigned num_ibs, >>>>>> return r; >>>>>> } >>>>>> - if (ring->funcs->emit_pipeline_sync && job && >>>>>> job->need_pipeline_sync) >>>>>> + if (ring->funcs->emit_pipeline_sync && job && >>>>>> + (tmp = amdgpu_sync_get_fence(&job->sched_sync))) { >>>>>> + job->need_pipeline_sync = true; >>>>>> amdgpu_ring_emit_pipeline_sync(ring); >>>>>> + fence_put(tmp); >>>>>> + } >>>>>> if (vm) { >>>>>> amdgpu_ring_insert_nop(ring, extra_nop); /* prevent CE >>>>>> go too fast than DE */ >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>>>>> index cfa97ab..fa0c8b1 100644 >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >>>>>> @@ -60,6 +60,7 @@ int amdgpu_job_alloc(struct amdgpu_device >>>>>> *adev, unsigned num_ibs, >>>>>> (*job)->need_pipeline_sync = false; >>>>>> amdgpu_sync_create(&(*job)->sync); >>>>>> + amdgpu_sync_create(&(*job)->sched_sync); >>>>>> return 0; >>>>>> } >>>>>> @@ -98,6 +99,7 @@ static void amdgpu_job_free_cb(struct >>>>>> amd_sched_job *s_job) >>>>>> fence_put(job->fence); >>>>>> amdgpu_sync_free(&job->sync); >>>>>> + amdgpu_sync_free(&job->sched_sync); >>>>>> kfree(job); >>>>>> } >>>>>> @@ -107,6 +109,7 @@ void amdgpu_job_free(struct amdgpu_job *job) >>>>>> fence_put(job->fence); >>>>>> amdgpu_sync_free(&job->sync); >>>>>> + amdgpu_sync_free(&job->sched_sync); >>>>>> kfree(job); >>>>>> } >>>>>> @@ -154,7 +157,7 @@ static struct fence >>>>>> *amdgpu_job_dependency(struct amd_sched_job *sched_job) >>>>>> } >>>>>> if (amd_sched_dependency_optimized(fence, >>>>>> sched_job->s_entity)) >>>>>> - job->need_pipeline_sync = true; >>>>>> + amdgpu_sync_fence(job->adev, &job->sched_sync, fence); >>>>> >>>>> This can result in an -ENOMEM >>>> will handle it. >>>>> and additional to that we only need to remember the last fence >>>>> optimized like this, not all of them. >>>>> >>>>> So just keep the last one found here in job->sched_fence instead. >>>> I guess this isn't enough. >>>> The dependency is not in order when calling, so the last one is not >>>> always the last scheduled fence. >>>> And they could be sched fence not hw fence, although they are >>>> handled by same hw ring, but the sched fence context isn't same. >>>> so we still need sched_sync here, right? >>> >>> No, amdgpu_job_dependency is only called again when the returned >>> fence is signaled (or scheduled on the same ring). >> Let use give an example for it: >> Assume job->sync has two fences(fenceA and fenceB) which could be >> scheduled. fenceA is from entity1, fenceB is from entity2, but both >> for gfx engine, but fenceA could be submitted to hw ring behind fenceB. >> the order in job->sync list is: others---->fenceA---->fenceB--->others. >> when calling amdgpu_job_dependency, fenceA will be checked first, and >> then fenceB. >> >> If following your proposal, we only store fenceB, but fenceA is the >> later. Which isn't expected. > > Ah! Indeed, I didn't realized that the dependent fence could have > already been scheduled. > > Mhm, how are we going to handle the out of memory situation then? Sine > we are inside a kernel thread we are not supposed to fail at this point. like grab vmid failed case, add DRM_ERROR, is it ok? Regards, David Zhou > > Regards, > Christian. > >> >> >> Regards, >> David Zhou >>> >>> So when this is called and you find that you need to wait for >>> another fence the order is guaranteed. >>> >>> Regards, >>> Christian. >>> >>>> >>>> Regards, >>>> David zhou >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>>> return fence; >>>>>> } >>>>> >>>>> >>>> >>> >> >