[PATCH 1/6] drm/amdgpu: write PTEs directly into the IB.

deathsimple@xxxxxxxxxxx (Christian König) · Fri, 12 Aug 2016 18:56:04 +0200

Am 12.08.2016 um 17:46 schrieb Alex Deucher:
> On Fri, Aug 12, 2016 at 9:52 AM, Christian KÃ¶nig
> <deathsimple at vodafone.de> wrote:
>> From: Christian KÃ¶nig <christian.koenig at amd.com>
>>
>> Write the PTEs at the end of the IB instead of directly into the SDMA commands.
>> This can save quite some CPU cycles building the entries.
>>
>> Signed-off-by: Christian KÃ¶nig <christian.koenig at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 26 +++++++++++++++++++++-----
>>   1 file changed, 21 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 2843132..7efcbe3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -910,15 +910,15 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>          /* padding, etc. */
>>          ndw = 64;
>>
>> -       if (params.src) {
>> +       if (src) {
>>                  /* only copy commands needed */
>>                  ndw += ncmds * 7;
>>
>> -       } else if (params.pages_addr) {
>> -               /* header for write data commands */
>> -               ndw += ncmds * 4;
>> +       } else if (pages_addr) {
>> +               /* copy commands needed */
>> +               ndw += ncmds * 7;
>>
>> -               /* body of write data command */
>> +               /* and also PTEs */
>>                  ndw += nptes * 2;
>>
>>          } else {
>> @@ -935,6 +935,22 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>
>>          params.ib = &job->ibs[0];
>>
>> +       if (!src && pages_addr) {
>> +               uint64_t *pte;
>> +               unsigned i;
>> +
>> +               /* Put the PTEs at the end of the IB. */
>> +               i = ndw - nptes * 2;
>> +               pte= (uint64_t *)&(job->ibs->ptr[i]);
>> +               params.src = job->ibs->gpu_addr + i * 4;
> Is the offset correct for all asics?  IIRC, ndw was kind of a worst
> case guess as the packet header sizes vary across families.

Yeah, that should work, but I can double check once more.

I actually don't change the dw estimation. Just instead of using the 
inline write command I stitch together the page table entries at the end 
of dw first and then use the copy command to move them over to the page 
tables.

That has the clear advantage of being way more cache friendly, because 
you don't jump around between dithings any more.

Christian.

>
> Alex
>
>> +
>> +               for (i = 0; i < nptes; ++i) {
>> +                       pte[i] = amdgpu_vm_map_gart(pages_addr, addr + i *
>> +                                                   AMDGPU_GPU_PAGE_SIZE);
>> +                       pte[i] |= flags;
>> +               }
>> +       }
>> +
>>          r = amdgpu_sync_fence(adev, &job->sync, exclusive);
>>          if (r)
>>                  goto error_free;
>> --
>> 2.5.0
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx