[PATCH 01/13] drm/amdgpu: fix vcn_v1_0_dec_ring_emit_wreg

ckoenig.leichtzumerken@xxxxxxxxx (Christian König) · Tue, 30 Jan 2018 14:51:53 +0100

Am 30.01.2018 um 00:01 schrieb Felix Kuehling:
> This is a very cool patch series. I made some specific comments on some
> of the patches. But overall this is great.

Thanks, going to comment on some of those on the patches.

> I guess your plan is to start testing the SVM programming model with
> ATC, and then enable the same programming model without the IOMMU using
> HMM. That means there could be working and validated tests to run on HMM.

Yes, exactly. Well I think it would make implementing HMM much easier if 
we could rely on most of the mapping being done by the ATC instead of 
manually crafted GPUVM page tables.

> I think 4.16 will get an IOMMUv2 driver that we worked on for Raven to
> support PPR on newer IOMMU versions. Without that the GPU is not able to
> make swapped or new pages resident or trigger a COW.

Uff? So the ATC is actually not able to handle page faults?

> We're currently still working on one problem with Raven, related to the
> way GFX9 retries memory accesses. Many PPR requests for the same virtual
> address can be outstanding (in an IOMMU log buffer). After the first
> request is handled, the GPU can continue, but the remaining requests are
> still in the queue. This can result in the IOMMU driver trying to handle
> a PPR for a page that's already freed by the application, which triggers
> an invalid PPR callback.
>
> An invalid PPR is like the GPU-equivalent of a segfault, and KFD
> implements it like that. With the above behaviour we end up segfaulting
> applications that didn't do anything wrong. I guess for your
> implementation it's not a problem because you don't implement that
> callback yet.

Yeah, that is exactly the same problem I'm currently running into with HMM.

The interrupt handling is pipelined (even much much more than the ATC 
path), so what can happen is that applications free up some memory but 
we have stale page faults for that page in the pipeline.

The only valid workaround I can see is to make sure interrupts are 
processed before returning to HMM that it can unmap pages, and that is a 
really show stopper for performance as far as I can see.

Regards,
Christian.

>
> Regards,
>  Â  Felix
>
>
> On 2018-01-26 03:13 PM, Christian KÃ¶nig wrote:
>> That got mixed up with the encode ring function.
>>
>> Signed-off-by: Christian KÃ¶nig <christian.koenig at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 19 ++++++++++++++++++-
>>   1 file changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>> index 44c041a1fe68..24ebc3e296a6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>> @@ -880,6 +880,22 @@ static void vcn_v1_0_dec_ring_emit_vm_flush(struct amdgpu_ring *ring,
>>   	vcn_v1_0_dec_vm_reg_wait(ring, data0, data1, mask);
>>   }
>>   
>> +static void vcn_v1_0_dec_ring_emit_wreg(struct amdgpu_ring *ring,
>> +					uint32_t reg, uint32_t val)
>> +{
>> +	struct amdgpu_device *adev = ring->adev;
>> +
>> +	amdgpu_ring_write(ring,
>> +		PACKET0(SOC15_REG_OFFSET(UVD, 0, mmUVD_GPCOM_VCPU_DATA0), 0));
>> +	amdgpu_ring_write(ring, reg << 2);
>> +	amdgpu_ring_write(ring,
>> +		PACKET0(SOC15_REG_OFFSET(UVD, 0, mmUVD_GPCOM_VCPU_DATA1), 0));
>> +	amdgpu_ring_write(ring, val);
>> +	amdgpu_ring_write(ring,
>> +		PACKET0(SOC15_REG_OFFSET(UVD, 0, mmUVD_GPCOM_VCPU_CMD), 0));
>> +	amdgpu_ring_write(ring, VCN_DEC_CMD_WRITE_REG << 1);
>> +}
>> +
>>   /**
>>    * vcn_v1_0_enc_ring_get_rptr - get enc read pointer
>>    *
>> @@ -1097,7 +1113,7 @@ static const struct amdgpu_ring_funcs vcn_v1_0_dec_ring_vm_funcs = {
>>   	.pad_ib = amdgpu_ring_generic_pad_ib,
>>   	.begin_use = amdgpu_vcn_ring_begin_use,
>>   	.end_use = amdgpu_vcn_ring_end_use,
>> -	.emit_wreg = vcn_v1_0_enc_ring_emit_wreg,
>> +	.emit_wreg = vcn_v1_0_dec_ring_emit_wreg,
>>   };
>>   
>>   static const struct amdgpu_ring_funcs vcn_v1_0_enc_ring_vm_funcs = {
>> @@ -1124,6 +1140,7 @@ static const struct amdgpu_ring_funcs vcn_v1_0_enc_ring_vm_funcs = {
>>   	.pad_ib = amdgpu_ring_generic_pad_ib,
>>   	.begin_use = amdgpu_vcn_ring_begin_use,
>>   	.end_use = amdgpu_vcn_ring_end_use,
>> +	.emit_wreg = vcn_v1_0_enc_ring_emit_wreg,
>>   };
>>   
>>   static void vcn_v1_0_set_dec_ring_funcs(struct amdgpu_device *adev)