Re: [PATCH] drm/amdkfd: Remove Align VRAM allocations to 1MB on APU ASIC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 13, 2022 at 5:14 AM shikai guo <shikai.guo@xxxxxxx> wrote:
>
> From: Shikai Guo <Shikai.Guo@xxxxxxx>
>
> While executing KFDMemoryTest.MMBench, test case will allocate 4KB size memory 1000 times.
> Every time, user space will get 2M memory.APU VRAM is 512M, there is not enough memory to be allocated.
> So the 2M aligned feature is not suitable for APU.

Wouldn't it be better to decide based on vram size rather than APU vs
dGPU?  some APUs have large carve outs.

Alex

>
> guoshikai@guoshikai-MayanKD-RMB:~/linth/libhsakmt/tests/kfdtest/build$ ./kfdtest --gtest_filter=KFDMemoryTest.MMBench
> [          ] Profile: Full Test
> [          ] HW capabilities: 0x9
> Note: Google Test filter = KFDMemoryTest.MMBench
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from KFDMemoryTest
> [ RUN      ] KFDMemoryTest.MMBench
> [          ] Found VRAM of 512MB.
> [          ] Available VRAM 328MB.
> [          ] Test (avg. ns)         alloc   mapOne  umapOne   mapAll  umapAll     free
> [          ] --------------------------------------------------------------------------
> [          ]   4K-SysMem-noSDMA     26561    10350     5212     3787     3981    12372
> [          ]  64K-SysMem-noSDMA     42864     6648     3973     5223     3843    15100
> [          ]   2M-SysMem-noSDMA    312906    12614     4390     6254     4790    70260
> [          ]  32M-SysMem-noSDMA   4417812   130437    21625    97687    18500   929562
> [          ]   1G-SysMem-noSDMA 132161000  2738000   583000  2181000   499000 39091000
> [          ] --------------------------------------------------------------------------
> /home/guoshikai/linth/libhsakmt/tests/kfdtest/src/KFDMemoryTest.cpp:922: Failure
> Value of: (hsaKmtAllocMemory(allocNode, bufSize, memFlags, &bufs[i]))
>   Actual: 6
> Expected: HSAKMT_STATUS_SUCCESS
> Which is: 0
> [  FAILED  ] KFDMemoryTest.MMBench (749 ms)
>
> fix this issue by adding different treatments for apu and dgpu
>
> Signed-off-by: ruili ji <ruili.ji@xxxxxxx>
> Signed-off-by: shikai guo <shikai.guo@xxxxxxx>
> ---
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c   | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index d1657de5f875..2ad2cd5e3e8b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -115,7 +115,9 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t size)
>   * compromise that should work in most cases without reserving too
>   * much memory for page tables unnecessarily (factor 16K, >> 14).
>   */
> -#define ESTIMATE_PT_SIZE(mem_size) max(((mem_size) >> 14), AMDGPU_VM_RESERVED_VRAM)
> +
> +#define ESTIMATE_PT_SIZE(adev, mem_size)   (adev->flags & AMD_IS_APU) ? \
> +                (mem_size >> 14) : max(((mem_size) >> 14), AMDGPU_VM_RESERVED_VRAM)
>
>  static size_t amdgpu_amdkfd_acc_size(uint64_t size)
>  {
> @@ -142,7 +144,7 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>                 uint64_t size, u32 alloc_flag)
>  {
>         uint64_t reserved_for_pt =
> -               ESTIMATE_PT_SIZE(amdgpu_amdkfd_total_mem_size);
> +               ESTIMATE_PT_SIZE(adev, amdgpu_amdkfd_total_mem_size);
>         size_t acc_size, system_mem_needed, ttm_mem_needed, vram_needed;
>         int ret = 0;
>
> @@ -156,12 +158,15 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>                 system_mem_needed = acc_size;
>                 ttm_mem_needed = acc_size;
>
> +               if (adev->flags & AMD_IS_APU)
> +                       vram_needed = size;
> +               else
>                 /*
>                  * Conservatively round up the allocation requirement to 2 MB
>                  * to avoid fragmentation caused by 4K allocations in the tail
>                  * 2M BO chunk.
>                  */
> -               vram_needed = ALIGN(size, VRAM_ALLOCATION_ALIGN);
> +                       vram_needed = ALIGN(size, VRAM_ALLOCATION_ALIGN);
>         } else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
>                 system_mem_needed = acc_size + size;
>                 ttm_mem_needed = acc_size;
> @@ -220,7 +225,10 @@ static void unreserve_mem_limit(struct amdgpu_device *adev,
>         } else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
>                 kfd_mem_limit.system_mem_used -= acc_size;
>                 kfd_mem_limit.ttm_mem_used -= acc_size;
> -               adev->kfd.vram_used -= ALIGN(size, VRAM_ALLOCATION_ALIGN);
> +               if (adev->flags & AMD_IS_APU)
> +                       adev->kfd.vram_used -= size;
> +               else
> +                       adev->kfd.vram_used -= ALIGN(size, VRAM_ALLOCATION_ALIGN);
>         } else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
>                 kfd_mem_limit.system_mem_used -= (acc_size + size);
>                 kfd_mem_limit.ttm_mem_used -= acc_size;
> @@ -1666,7 +1674,7 @@ int amdgpu_amdkfd_criu_resume(void *p)
>  size_t amdgpu_amdkfd_get_available_memory(struct amdgpu_device *adev)
>  {
>         uint64_t reserved_for_pt =
> -               ESTIMATE_PT_SIZE(amdgpu_amdkfd_total_mem_size);
> +               ESTIMATE_PT_SIZE(adev, amdgpu_amdkfd_total_mem_size);
>         size_t available;
>
>         spin_lock(&kfd_mem_limit.mem_limit_lock);
> --
> 2.25.1
>



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux