[PATCH 6/6] drm/amdgpu: use more than 64KB fragment size if possible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 09.08.2016 um 17:49 schrieb Jay Cornwall:
> On 2016-08-09 07:52, Christian König wrote:
>> From: Christian König <christian.koenig at amd.com>
>>
>> We align to 64KB, but when userspace aligns even more we can easily 
>> use more.
>>
>> Signed-off-by: Christian König <christian.koenig at amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 ++++++++----
>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index e6c030b..88f4109 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -817,13 +817,13 @@ static void amdgpu_vm_frag_ptes(struct
>> amdgpu_pte_update_params    *params,
>>       * allocation size to the fragment size.
>>       */
>>
>> -    /* SI and newer are optimized for 64KB */
>> -    uint64_t frag_flags = AMDGPU_PTE_FRAG(AMDGPU_LOG2_PAGES_PER_FRAG);
>> -    uint64_t frag_align = 1 << AMDGPU_LOG2_PAGES_PER_FRAG;
>> +    const uint64_t frag_align = 1 << AMDGPU_LOG2_PAGES_PER_FRAG;
>>
>>      uint64_t frag_start = ALIGN(start, frag_align);
>>      uint64_t frag_end = end & ~(frag_align - 1);
>>
>> +    uint32_t frag;
>> +
>>      /* system pages are non continuously */
>>      if (params->src || params->pages_addr || !(flags & 
>> AMDGPU_PTE_VALID) ||
>>          (frag_start >= frag_end)) {
>> @@ -832,6 +832,10 @@ static void amdgpu_vm_frag_ptes(struct
>> amdgpu_pte_update_params    *params,
>>          return;
>>      }
>>
>> +    /* use more than 64KB fragment size if possible */
>> +    frag = lower_32_bits(frag_start | frag_end);
>> +    frag = likely(frag) ? __ffs(frag) : 31;
>> +
>>      /* handle the 4K area at the beginning */
>>      if (start != frag_start) {
>>          amdgpu_vm_update_ptes(params, vm, start, frag_start,
>> @@ -841,7 +845,7 @@ static void amdgpu_vm_frag_ptes(struct
>> amdgpu_pte_update_params    *params,
>>
>>      /* handle the area in the middle */
>>      amdgpu_vm_update_ptes(params, vm, frag_start, frag_end, dst,
>> -                  flags | frag_flags);
>> +                  flags | AMDGPU_PTE_FRAG(frag));
>>
>>      /* handle the 4K area at the end */
>>      if (frag_end != end) {
>
> Would this change not direct larger fragments away from the BigK TLB 
> partition?
>
> My understanding was VM_L2_CNTL3.L2_CACHE_BIGK_FRAGMENT_SIZE is an 
> exact match and not a minimum size. I can't find any immediate 
> documentation on that topic to confirm.

Yeah I was questioning that myself as well, especially since you wrote 
in the initial patch that SI and later are optimized for 64K.

So I tested it on Tonga and Polaris10 and it seems to work as expected, 
e.g. a 1MB fragment size really results in not reading the other page 
table entries as soon as it is cached.

But I'm not sure how exactly this partitioning of the L2 works and what 
effect it should have.

Regards,
Christian.


[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux