[PATCH] drm/amdgpu: fix ocl test performance drop

deathsimple@xxxxxxxxxxx (Christian König) · Fri, 19 May 2017 13:57:17 +0200

Am 19.05.2017 um 04:25 schrieb Flora Cui:
> On Thu, May 18, 2017 at 01:38:15PM +0200, Christian KÃ¶nig wrote:
>> Am 18.05.2017 um 09:45 schrieb Flora Cui:
>>> partial revert commit <6971d3d> - drm/amdgpu: cleanup logic in
>>> amdgpu_vm_flush
>>>
>>> Change-Id: Iadce9d613dfe9a739643a74050cea55854832adb
>>> Signed-off-by: Flora Cui <Flora.Cui at amd.com>
>> I don't see how the revert should be faster than the original.
>>
>> Especially that amdgpu_vm_had_gpu_reset() is now called twice sounds like
>> more overhead than necessary.
>>
>> Please explain further.
>>
>> Christian.
>>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 14 +++++---------
>>>   1 file changed, 5 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 88420dc..a96bad6 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -743,23 +743,19 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
>>>   		id->gws_size != job->gws_size ||
>>>   		id->oa_base != job->oa_base ||
>>>   		id->oa_size != job->oa_size);
>>> -	bool vm_flush_needed = job->vm_needs_flush ||
>>> -		amdgpu_vm_ring_has_compute_vm_bug(ring);
>>>   	unsigned patch_offset = 0;
>>>   	int r;
>>> -	if (amdgpu_vm_had_gpu_reset(adev, id)) {
>>> -		gds_switch_needed = true;
>>> -		vm_flush_needed = true;
>>> -	}
>>> -
>>> -	if (!vm_flush_needed && !gds_switch_needed)
>>> +	if (!job->vm_needs_flush && !gds_switch_needed &&
>>> +	    !amdgpu_vm_had_gpu_reset(adev, id) &&
>>> +	    !amdgpu_vm_ring_has_compute_vm_bug(ring))
>>>   		return 0;
>>>   	if (ring->funcs->init_cond_exec)
>>>   		patch_offset = amdgpu_ring_init_cond_exec(ring);
>>> -	if (ring->funcs->emit_vm_flush && vm_flush_needed) {
> [flora]: for compute ring & amdgpu_vm_ring_has_compute_vm_bug(), a vm_flush is
> inserted. This might cause performance drop.

Ah, I see. We only need the pipeline sync, but not the vm flush.

In this case I suggest to just change the following line in 
amdgpu_vm_flush():
> -	bool vm_flush_needed = job->vm_needs_flush ||
> -		amdgpu_vm_ring_has_compute_vm_bug(ring);

We can keep it in amdgpu_vm_need_pipeline_sync().

BTW: We should cache the result of amdgpu_vm_ring_has_compute_vm_bug() 
in the vm manager structure. Computing this on the fly for every command 
submissions is just a huge bunch of overhead.

Regards,
Christian.