[AMD Official Use Only - General] Thanks Felix comment, I will further debug this issue. -----Original Message----- From: Guo, Shikai Sent: Friday, July 15, 2022 11:21 AM To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Phillips, Daniel <Daniel.Phillips@xxxxxxx>; Ji, Ruili <Ruili.Ji@xxxxxxx>; Liu, Aaron <Aaron.Liu@xxxxxxx> Subject: RE: [PATCH] drm/amdkfd: Remove Align VRAM allocations to 1MB on APU ASIC [AMD Official Use Only - General] This Felix comment, I will further debug this issue. -----Original Message----- From: Kuehling, Felix <Felix.Kuehling@xxxxxxx> Sent: Wednesday, July 13, 2022 10:17 PM To: Guo, Shikai <Shikai.Guo@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Phillips, Daniel <Daniel.Phillips@xxxxxxx>; Ji, Ruili <Ruili.Ji@xxxxxxx>; Liu, Aaron <Aaron.Liu@xxxxxxx> Subject: Re: [PATCH] drm/amdkfd: Remove Align VRAM allocations to 1MB on APU ASIC Am 2022-07-13 um 05:14 schrieb shikai guo: > From: Shikai Guo <Shikai.Guo@xxxxxxx> > > While executing KFDMemoryTest.MMBench, test case will allocate 4KB size memory 1000 times. > Every time, user space will get 2M memory.APU VRAM is 512M, there is not enough memory to be allocated. > So the 2M aligned feature is not suitable for APU. NAK. We can try to make the estimate of available VRAM more accurate. But in the end, this comes down to limitations of the VRAM manager and how it handles memory fragmentation. A large discrepancy between total VRAM and available VRAM can have a few reasons: * Big system memory means we need to reserve more space for page tables * Many small allocations causing lots of fragmentation. This may be the result of memory leaks in previous tests This patch can "fix" a situation where a leak caused excessive fragmentation. But that just papers over the leak. And it will cause the opposite problem for the new AvailableMemory test that checks that we can really allocate as much memory as we promised. Regards, Felix > > guoshikai@guoshikai-MayanKD-RMB:~/linth/libhsakmt/tests/kfdtest/build$ ./kfdtest --gtest_filter=KFDMemoryTest.MMBench > [ ] Profile: Full Test > [ ] HW capabilities: 0x9 > Note: Google Test filter = KFDMemoryTest.MMBench [==========] Running > 1 test from 1 test case. > [----------] Global test environment set-up. > [----------] 1 test from KFDMemoryTest > [ RUN ] KFDMemoryTest.MMBench > [ ] Found VRAM of 512MB. > [ ] Available VRAM 328MB. > [ ] Test (avg. ns) alloc mapOne umapOne mapAll umapAll free > [ ] -------------------------------------------------------------------------- > [ ] 4K-SysMem-noSDMA 26561 10350 5212 3787 3981 12372 > [ ] 64K-SysMem-noSDMA 42864 6648 3973 5223 3843 15100 > [ ] 2M-SysMem-noSDMA 312906 12614 4390 6254 4790 70260 > [ ] 32M-SysMem-noSDMA 4417812 130437 21625 97687 18500 929562 > [ ] 1G-SysMem-noSDMA 132161000 2738000 583000 2181000 499000 39091000 > [ ] -------------------------------------------------------------------------- > /home/guoshikai/linth/libhsakmt/tests/kfdtest/src/KFDMemoryTest.cpp:92 > 2: Failure Value of: (hsaKmtAllocMemory(allocNode, bufSize, memFlags, &bufs[i])) > Actual: 6 > Expected: HSAKMT_STATUS_SUCCESS > Which is: 0 > [ FAILED ] KFDMemoryTest.MMBench (749 ms) > > fix this issue by adding different treatments for apu and dgpu > > Signed-off-by: ruili ji <ruili.ji@xxxxxxx> > Signed-off-by: shikai guo <shikai.guo@xxxxxxx> > --- > .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 18 +++++++++++++----- > 1 file changed, 13 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > index d1657de5f875..2ad2cd5e3e8b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > @@ -115,7 +115,9 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t size) > * compromise that should work in most cases without reserving too > * much memory for page tables unnecessarily (factor 16K, >> 14). > */ > -#define ESTIMATE_PT_SIZE(mem_size) max(((mem_size) >> 14), > AMDGPU_VM_RESERVED_VRAM) > + > +#define ESTIMATE_PT_SIZE(adev, mem_size) (adev->flags & AMD_IS_APU) ? \ > + (mem_size >> 14) : max(((mem_size) >> 14), > +AMDGPU_VM_RESERVED_VRAM) > > static size_t amdgpu_amdkfd_acc_size(uint64_t size) > { > @@ -142,7 +144,7 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev, > uint64_t size, u32 alloc_flag) > { > uint64_t reserved_for_pt = > - ESTIMATE_PT_SIZE(amdgpu_amdkfd_total_mem_size); > + ESTIMATE_PT_SIZE(adev, amdgpu_amdkfd_total_mem_size); > size_t acc_size, system_mem_needed, ttm_mem_needed, vram_needed; > int ret = 0; > > @@ -156,12 +158,15 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev, > system_mem_needed = acc_size; > ttm_mem_needed = acc_size; > > + if (adev->flags & AMD_IS_APU) > + vram_needed = size; > + else > /* > * Conservatively round up the allocation requirement to 2 MB > * to avoid fragmentation caused by 4K allocations in the tail > * 2M BO chunk. > */ > - vram_needed = ALIGN(size, VRAM_ALLOCATION_ALIGN); > + vram_needed = ALIGN(size, VRAM_ALLOCATION_ALIGN); > } else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) { > system_mem_needed = acc_size + size; > ttm_mem_needed = acc_size; > @@ -220,7 +225,10 @@ static void unreserve_mem_limit(struct amdgpu_device *adev, > } else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) { > kfd_mem_limit.system_mem_used -= acc_size; > kfd_mem_limit.ttm_mem_used -= acc_size; > - adev->kfd.vram_used -= ALIGN(size, VRAM_ALLOCATION_ALIGN); > + if (adev->flags & AMD_IS_APU) > + adev->kfd.vram_used -= size; > + else > + adev->kfd.vram_used -= ALIGN(size, VRAM_ALLOCATION_ALIGN); > } else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) { > kfd_mem_limit.system_mem_used -= (acc_size + size); > kfd_mem_limit.ttm_mem_used -= acc_size; @@ -1666,7 +1674,7 @@ int > amdgpu_amdkfd_criu_resume(void *p) > size_t amdgpu_amdkfd_get_available_memory(struct amdgpu_device *adev) > { > uint64_t reserved_for_pt = > - ESTIMATE_PT_SIZE(amdgpu_amdkfd_total_mem_size); > + ESTIMATE_PT_SIZE(adev, amdgpu_amdkfd_total_mem_size); > size_t available; > > spin_lock(&kfd_mem_limit.mem_limit_lock);