[PATCH 6/6] drm/amdgpu: use more than 64KB fragment size if possible

deathsimple@xxxxxxxxxxx (Christian König) · Tue, 9 Aug 2016 18:35:30 +0200

Am 09.08.2016 um 17:49 schrieb Jay Cornwall:
> On 2016-08-09 07:52, Christian KÃ¶nig wrote:
>> From: Christian KÃ¶nig <christian.koenig at amd.com>
>>
>> We align to 64KB, but when userspace aligns even more we can easily 
>> use more.
>>
>> Signed-off-by: Christian KÃ¶nig <christian.koenig at amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 ++++++++----
>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index e6c030b..88f4109 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -817,13 +817,13 @@ static void amdgpu_vm_frag_ptes(struct
>> amdgpu_pte_update_params    *params,
>>       * allocation size to the fragment size.
>>       */
>>
>> -    /* SI and newer are optimized for 64KB */
>> -    uint64_t frag_flags = AMDGPU_PTE_FRAG(AMDGPU_LOG2_PAGES_PER_FRAG);
>> -    uint64_t frag_align = 1 << AMDGPU_LOG2_PAGES_PER_FRAG;
>> +    const uint64_t frag_align = 1 << AMDGPU_LOG2_PAGES_PER_FRAG;
>>
>>      uint64_t frag_start = ALIGN(start, frag_align);
>>      uint64_t frag_end = end & ~(frag_align - 1);
>>
>> +    uint32_t frag;
>> +
>>      /* system pages are non continuously */
>>      if (params->src || params->pages_addr || !(flags & 
>> AMDGPU_PTE_VALID) ||
>>          (frag_start >= frag_end)) {
>> @@ -832,6 +832,10 @@ static void amdgpu_vm_frag_ptes(struct
>> amdgpu_pte_update_params    *params,
>>          return;
>>      }
>>
>> +    /* use more than 64KB fragment size if possible */
>> +    frag = lower_32_bits(frag_start | frag_end);
>> +    frag = likely(frag) ? __ffs(frag) : 31;
>> +
>>      /* handle the 4K area at the beginning */
>>      if (start != frag_start) {
>>          amdgpu_vm_update_ptes(params, vm, start, frag_start,
>> @@ -841,7 +845,7 @@ static void amdgpu_vm_frag_ptes(struct
>> amdgpu_pte_update_params    *params,
>>
>>      /* handle the area in the middle */
>>      amdgpu_vm_update_ptes(params, vm, frag_start, frag_end, dst,
>> -                  flags | frag_flags);
>> +                  flags | AMDGPU_PTE_FRAG(frag));
>>
>>      /* handle the 4K area at the end */
>>      if (frag_end != end) {
>
> Would this change not direct larger fragments away from the BigK TLB 
> partition?
>
> My understanding was VM_L2_CNTL3.L2_CACHE_BIGK_FRAGMENT_SIZE is an 
> exact match and not a minimum size. I can't find any immediate 
> documentation on that topic to confirm.

Yeah I was questioning that myself as well, especially since you wrote 
in the initial patch that SI and later are optimized for 64K.

So I tested it on Tonga and Polaris10 and it seems to work as expected, 
e.g. a 1MB fragment size really results in not reading the other page 
table entries as soon as it is cached.

But I'm not sure how exactly this partitioning of the L2 works and what 
effect it should have.

Regards,
Christian.