[PATCH 1/4] drm/amdgpu: add sched sync for amdgpu job

david1.zhou@xxxxxxx (zhoucm1) · Wed, 10 May 2017 17:00:43 +0800

On 2017å¹´05æ??10æ?¥ 16:50, Christian KÃ¶nig wrote:
> Am 10.05.2017 um 10:38 schrieb zhoucm1:
>>
>>
>> On 2017å¹´05æ??10æ?¥ 16:26, Christian KÃ¶nig wrote:
>>> Am 10.05.2017 um 09:31 schrieb Chunming Zhou:
>>>> this is an improvement for previous patch, the sched_sync is to 
>>>> store fence
>>>> that could be skipped as scheduled, when job is executed, we didn't 
>>>> need
>>>> pipeline_sync if all fences in sched_sync are signalled, otherwise 
>>>> insert
>>>> pipeline_sync still.
>>>>
>>>> Change-Id: I26d3a2794272ba94b25753d4bf367326d12f6939
>>>> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h     | 1 +
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  | 7 ++++++-
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 5 ++++-
>>>>   3 files changed, 11 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> index 787acd7..ef018bf 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> @@ -1162,6 +1162,7 @@ struct amdgpu_job {
>>>>       struct amdgpu_vm    *vm;
>>>>       struct amdgpu_ring    *ring;
>>>>       struct amdgpu_sync    sync;
>>>> +    struct amdgpu_sync    sched_sync;
>>>>       struct amdgpu_ib    *ibs;
>>>>       struct fence        *fence; /* the hw fence */
>>>>       uint32_t        preamble_status;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>>> index 2c6624d..86ad507 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>>> @@ -121,6 +121,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring 
>>>> *ring, unsigned num_ibs,
>>>>   {
>>>>       struct amdgpu_device *adev = ring->adev;
>>>>       struct amdgpu_ib *ib = &ibs[0];
>>>> +    struct fence *tmp;
>>>>       bool skip_preamble, need_ctx_switch;
>>>>       unsigned patch_offset = ~0;
>>>>       struct amdgpu_vm *vm;
>>>> @@ -167,8 +168,12 @@ int amdgpu_ib_schedule(struct amdgpu_ring 
>>>> *ring, unsigned num_ibs,
>>>>           return r;
>>>>       }
>>>>   -    if (ring->funcs->emit_pipeline_sync && job && 
>>>> job->need_pipeline_sync)
>>>> +    if (ring->funcs->emit_pipeline_sync && job &&
>>>> +        (tmp = amdgpu_sync_get_fence(&job->sched_sync))) {
>>>> +        job->need_pipeline_sync = true;
>>>>           amdgpu_ring_emit_pipeline_sync(ring);
>>>> +        fence_put(tmp);
>>>> +    }
>>>>       if (vm) {
>>>>           amdgpu_ring_insert_nop(ring, extra_nop); /* prevent CE go 
>>>> too fast than DE */
>>>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> index cfa97ab..fa0c8b1 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> @@ -60,6 +60,7 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, 
>>>> unsigned num_ibs,
>>>>       (*job)->need_pipeline_sync = false;
>>>>         amdgpu_sync_create(&(*job)->sync);
>>>> +    amdgpu_sync_create(&(*job)->sched_sync);
>>>>         return 0;
>>>>   }
>>>> @@ -98,6 +99,7 @@ static void amdgpu_job_free_cb(struct 
>>>> amd_sched_job *s_job)
>>>>         fence_put(job->fence);
>>>>       amdgpu_sync_free(&job->sync);
>>>> +    amdgpu_sync_free(&job->sched_sync);
>>>>       kfree(job);
>>>>   }
>>>>   @@ -107,6 +109,7 @@ void amdgpu_job_free(struct amdgpu_job *job)
>>>>         fence_put(job->fence);
>>>>       amdgpu_sync_free(&job->sync);
>>>> +    amdgpu_sync_free(&job->sched_sync);
>>>>       kfree(job);
>>>>   }
>>>>   @@ -154,7 +157,7 @@ static struct fence 
>>>> *amdgpu_job_dependency(struct amd_sched_job *sched_job)
>>>>       }
>>>>         if (amd_sched_dependency_optimized(fence, 
>>>> sched_job->s_entity))
>>>> -        job->need_pipeline_sync = true;
>>>> +        amdgpu_sync_fence(job->adev, &job->sched_sync, fence);
>>>
>>> This can result in an -ENOMEM 
>> will handle it.
>>> and additional to that we only need to remember the last fence 
>>> optimized like this, not all of them.
>>>
>>> So just keep the last one found here in job->sched_fence instead.
>> I guess this isn't enough.
>> The dependency is not in order when calling, so the last one is not 
>> always the last scheduled fence.
>> And they could be sched fence not hw fence, although they are handled 
>> by same hw ring, but the sched fence context isn't same.
>> so we still need sched_sync here, right?
>
> No, amdgpu_job_dependency is only called again when the returned fence 
> is signaled (or scheduled on the same ring).
Let use give an example for it:
Assume job->sync has two fences(fenceA and fenceB) which could be 
scheduled. fenceA is from entity1, fenceB is from entity2, but both for 
gfx engine, but fenceA could be submitted to hw ring behind fenceB.
the order in job->sync list is: others---->fenceA---->fenceB--->others.
when calling amdgpu_job_dependency, fenceA will be checked first, and 
then fenceB.

If following your proposal, we only store fenceB, but fenceA is the 
later. Which isn't  expected.

Regards,
David Zhou
>
> So when this is called and you find that you need to wait for another 
> fence the order is guaranteed.
>
> Regards,
> Christian.
>
>>
>> Regards,
>> David zhou
>>>
>>> Regards,
>>> Christian.
>>>
>>>>         return fence;
>>>>   }
>>>
>>>
>>
>