答复: 答复: [PATCH 12/13] drm/amdgpu:changes in gfx DMAframe scheme

Monk.Liu@xxxxxxx (Liu, Monk) · Tue, 28 Mar 2017 08:42:16 +0000

No sure what version of the code your are working with, but at least on
amd-staging-4.9 the 128 NOPs are still present for GFX8.
[ml] the following patch will remove the 128 NOPs after vm flush for gfx8

GFX7 uses the double switch buffer package and that can be removed if you add this in the vm_flush code.
[ML] GFX7 doesn't support SRIOV, so we don't set the "switch_buffer" function member " emit_switch_buffer" to its ring structure, thus it need manually insert switch_buffers in gfx_v7_0_vm_flush() 
But for gfx 8 and gfx9, because they support SRIOV, so we set the function member "emit_switch_buffer", and the ib_schedule() will know it and will insert it at bottom of each frame.
For amdgpu_vm_flush(), it will only insert SWITCH_BUFFER when the gfx ring has " emit_switch_buffer" member, so amdgpu_vm_flush() actually behaves the same for gfx7 and gfx6, only affect behave of gfx8 & gfx9, on SRIOV case.

Additional to that you haven't answered why we need the conditional execution for the vm flush?

[ML]I already answered the question, in git log message:
>> this change can fix vm fault issue caused by below scenario without 
>> this change:
>>
>>> CE passed original COND_EXEC (no MCBP issued this moment),
>>    proceed as normal.
>>
>>> DE catch up to this COND_EXEC, but this time MCBP issued,
>>    thus DE treats all following packages as NOP. The following
>>    VM switch packages now looks just as NOP to DE, so DE
>>    dosen't do VM flush at all.
>>
>>> Now CE proceeds to the first IBc, and triggers VM fault,
>>    because DE didn't do VM flush for this DMAframe.

With this new CE wrapping around IB, that can avoid above corner case.

-----é?®ä»¶å??ä»¶-----
å??ä»¶äºº: Christian KÃ¶nig [mailto:deathsimple at vodafone.de] 
å??é??æ?¶é?´: Tuesday, March 28, 2017 4:34 PM
æ?¶ä»¶äºº: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
ä¸»é¢?: Re: ç?å¤?: [PATCH 12/13] drm/amdgpu:changes in gfx DMAframe scheme

No sure what version of the code your are working with, but at least on
amd-staging-4.9 the 128 NOPs are still present for GFX8.

GFX7 uses the double switch buffer package and that can be removed if you add this in the vm_flush code.

Additional to that you haven't answered why we need the conditional execution for the vm flush?

Regards,
Christian.

Am 28.03.2017 um 05:23 schrieb Liu, Monk:
> Christian
>
> For the 128 NOPs after VM flush, already removed for gfx8 & 7
>
> BR  Monk
>
> -----é?®ä»¶å??ä»¶-----
> å??ä»¶äºº: Christian KÃ¶nig [mailto:deathsimple at vodafone.de]
> å??é??æ?¶é?´: Tuesday, March 28, 2017 12:17 AM
> æ?¶ä»¶äºº: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
> ä¸»é¢?: Re: [PATCH 12/13] drm/amdgpu:changes in gfx DMAframe scheme
>
> Am 24.03.2017 um 11:38 schrieb Monk Liu:
>> 1) Adapt to vulkan:
>> Now use double SWITCH BUFFER to replace the 128 nops w/a, because 
>> when vulkan introduced, umd can insert 7 ~ 16 IBs per submit which 
>> makes
>> 256 DW size cannot hold the whole DMAframe (if we still insert those
>> 128 nops), CP team suggests use double SWITCH_BUFFERs, instead of 
>> tricky 128 NOPs w/a.
>>
>> 2) To fix the CE VM fault issue when MCBP introduced:
>> Need one more COND_EXEC wrapping IB part (original one us for VM 
>> switch part).
>>
>> this change can fix vm fault issue caused by below scenario without 
>> this change:
>>
>>> CE passed original COND_EXEC (no MCBP issued this moment),
>>    proceed as normal.
>>
>>> DE catch up to this COND_EXEC, but this time MCBP issued,
>>    thus DE treats all following packages as NOP. The following
>>    VM switch packages now looks just as NOP to DE, so DE
>>    dosen't do VM flush at all.
>>
>>> Now CE proceeds to the first IBc, and triggers VM fault,
>>    because DE didn't do VM flush for this DMAframe.
>>
>> 3) change estimated alloc size for gfx9.
>> with new DMAframe scheme, we need modify emit_frame_size for gfx9
>>
>> with above changes, no more 128 NOPs w/a after VM flush
> So we are back to square one? Cause that is exactly what we used to have which you replaced with the 128 NOPs.
>
>> Change-Id: Ib3f92d9d5a81bfff0369a00f23e1e5891797089a
>> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
> Sorry for the delay, have been busy with tracking down a hardware issue.
>
> First of all please fix your editor to correctly indent your code. This file once more doesn't has correct indentation.
>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c |  8 ++--
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 77 +++++++++++++++++++++-------------
>>    drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 29 ++++++++-----
>>    3 files changed, 69 insertions(+), 45 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> index d103270..b300929 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> @@ -167,9 +167,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
>>    		return r;
>>    	}
>>    
>> -	if (ring->funcs->init_cond_exec)
>> -		patch_offset = amdgpu_ring_init_cond_exec(ring);
>> -
>>    	if (vm) {
>>    		amdgpu_ring_insert_nop(ring, extra_nop); /* prevent CE go too 
>> fast than DE */
>>    
>> @@ -180,7 +177,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
>>    		}
>>    	}
>>    
>> -	if (ring->funcs->emit_hdp_flush
>> +	if (ring->funcs->init_cond_exec)
>> +		patch_offset = amdgpu_ring_init_cond_exec(ring);
>> +
>> +		if (ring->funcs->emit_hdp_flush
> Ok, so the conditional execution shouldn't cover the VM flush any more.
> That makes perfect sense.
>
> Maybe split that up as separate patch? Would make reviewing and editing this one easier.
>
>>    #ifdef CONFIG_X86_64
>>    	    && !(adev->flags & AMD_IS_APU)
>>    #endif
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 9ff445c..74be4fa 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -483,42 +483,59 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
>>    		id->oa_size != job->oa_size);
>>    	int r;
>>    
>> -	if (ring->funcs->emit_pipeline_sync && (
>> -	    job->vm_needs_flush || gds_switch_needed ||
>> -	    amdgpu_vm_ring_has_compute_vm_bug(ring)))
>> -		amdgpu_ring_emit_pipeline_sync(ring);
>> +	if (job->vm_needs_flush || gds_switch_needed ||
>> +		amdgpu_vm_is_gpu_reset(adev, id) ||
>> +		amdgpu_vm_ring_has_compute_vm_bug(ring)) {
>> +		unsigned patch_offset = 0;
>>    
>> -	if (ring->funcs->emit_vm_flush && (job->vm_needs_flush ||
>> -	    amdgpu_vm_is_gpu_reset(adev, id))) {
>> -		struct fence *fence;
>> -		u64 pd_addr = amdgpu_vm_adjust_mc_addr(adev, job->vm_pd_addr);
>> +		if (ring->funcs->init_cond_exec)
>> +			patch_offset = amdgpu_ring_init_cond_exec(ring);
> Ok, why are you inserting a cond_exec for the VM flush here?
>
>>    
>> -		trace_amdgpu_vm_flush(pd_addr, ring->idx, job->vm_id);
>> -		amdgpu_ring_emit_vm_flush(ring, job->vm_id, pd_addr);
>> +		if (ring->funcs->emit_pipeline_sync &&
>> +			(job->vm_needs_flush || gds_switch_needed ||
>> +			amdgpu_vm_ring_has_compute_vm_bug(ring)))
>> +			amdgpu_ring_emit_pipeline_sync(ring);
>>    
>> -		r = amdgpu_fence_emit(ring, &fence);
>> -		if (r)
>> -			return r;
>> +		if (ring->funcs->emit_vm_flush && (job->vm_needs_flush ||
>> +			amdgpu_vm_is_gpu_reset(adev, id))) {
>> +			struct fence *fence;
>> +			u64 pd_addr = amdgpu_vm_adjust_mc_addr(adev, job->vm_pd_addr);
>>    
>> -		mutex_lock(&adev->vm_manager.lock);
>> -		fence_put(id->last_flush);
>> -		id->last_flush = fence;
>> -		mutex_unlock(&adev->vm_manager.lock);
>> -	}
>> +			trace_amdgpu_vm_flush(pd_addr, ring->idx, job->vm_id);
>> +			amdgpu_ring_emit_vm_flush(ring, job->vm_id, pd_addr);
>>    
>> -	if (gds_switch_needed) {
>> -		id->gds_base = job->gds_base;
>> -		id->gds_size = job->gds_size;
>> -		id->gws_base = job->gws_base;
>> -		id->gws_size = job->gws_size;
>> -		id->oa_base = job->oa_base;
>> -		id->oa_size = job->oa_size;
>> -		amdgpu_ring_emit_gds_switch(ring, job->vm_id,
>> -					    job->gds_base, job->gds_size,
>> -					    job->gws_base, job->gws_size,
>> -					    job->oa_base, job->oa_size);
>> -	}
>> +			r = amdgpu_fence_emit(ring, &fence);
>> +			if (r)
>> +				return r;
>>    
>> +			mutex_lock(&adev->vm_manager.lock);
>> +			fence_put(id->last_flush);
>> +			id->last_flush = fence;
>> +			mutex_unlock(&adev->vm_manager.lock);
>> +		}
>> +
>> +		if (gds_switch_needed) {
>> +			id->gds_base = job->gds_base;
>> +			id->gds_size = job->gds_size;
>> +			id->gws_base = job->gws_base;
>> +			id->gws_size = job->gws_size;
>> +			id->oa_base = job->oa_base;
>> +			id->oa_size = job->oa_size;
>> +			amdgpu_ring_emit_gds_switch(ring, job->vm_id,
>> +							job->gds_base, job->gds_size,
>> +							job->gws_base, job->gws_size,
>> +							job->oa_base, job->oa_size);
>> +		}
>> +
>> +		if (ring->funcs->patch_cond_exec)
>> +			amdgpu_ring_patch_cond_exec(ring, patch_offset);
>> +
>> +		/* the double SWITCH_BUFFER here *cannot* be skipped by COND_EXEC */
>> +		if (ring->funcs->emit_switch_buffer) {
>> +			amdgpu_ring_emit_switch_buffer(ring);
>> +			amdgpu_ring_emit_switch_buffer(ring);
>> +		}
>> +	}
>>    	return 0;
>>    }
>>    
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> index ce6fa03..eb8551b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> @@ -3134,8 +3134,6 @@ static void gfx_v9_0_ring_emit_vm_flush(struct amdgpu_ring *ring,
>>    		/* sync PFP to ME, otherwise we might get invalid PFP reads */
>>    		amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0));
>>    		amdgpu_ring_write(ring, 0x0);
>> -		/* Emits 128 dw nop to prevent CE access VM before vm_flush finish */
>> -		amdgpu_ring_insert_nop(ring, 128);
> If this isn't needed any more you should remove that from gfx8 and gfx7 as well.
>
> Regards,
> Christian.
>
>>    	}
>>    }
>>    
>> @@ -3629,15 +3627,24 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = {
>>    	.get_rptr = gfx_v9_0_ring_get_rptr_gfx,
>>    	.get_wptr = gfx_v9_0_ring_get_wptr_gfx,
>>    	.set_wptr = gfx_v9_0_ring_set_wptr_gfx,
>> -	.emit_frame_size =
>> -		20 + /* gfx_v9_0_ring_emit_gds_switch */
>> -		7 + /* gfx_v9_0_ring_emit_hdp_flush */
>> -		5 + /* gfx_v9_0_ring_emit_hdp_invalidate */
>> -		8 + 8 + 8 +/* gfx_v9_0_ring_emit_fence x3 for user fence, vm fence */
>> -		7 + /* gfx_v9_0_ring_emit_pipeline_sync */
>> -		128 + 66 + /* gfx_v9_0_ring_emit_vm_flush */
>> -		2 + /* gfx_v9_ring_emit_sb */
>> -		3, /* gfx_v9_ring_emit_cntxcntl */
>> +	.emit_frame_size = /* totally 242 maximum if 16 IBs */
>> +		5 +  /* COND_EXEC */
>> +		7 +  /* PIPELINE_SYNC */
>> +		46 + /* VM_FLUSH */
>> +		8 +  /* FENCE for VM_FLUSH */
>> +		20 + /* GDS switch */
>> +		4 + /* double SWITCH_BUFFER,
>> +		       the first COND_EXEC jump to the place just
>> +			   prior to this double SWITCH_BUFFER  */
>> +		5 + /* COND_EXEC */
>> +		7 +	 /*	HDP_flush */
>> +		4 +	 /*	VGT_flush */
>> +		14 + /*	CE_META */
>> +		31 + /*	DE_META */
>> +		3 + /* CNTX_CTRL */
>> +		5 + /* HDP_INVL */
>> +		8 + 8 + /* FENCE x2 */
>> +		2, /* SWITCH_BUFFER */
>>    	.emit_ib_size =	4, /* gfx_v9_0_ring_emit_ib_gfx */
>>    	.emit_ib = gfx_v9_0_ring_emit_ib_gfx,
>>    	.emit_fence = gfx_v9_0_ring_emit_fence,
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx