On 2022-05-18 17:31, Felix Kuehling
wrote:
On 2022-05-18 13:55, philip yang wrote:Based on the calculation, set 15/16 free system memory limit is enough to prevent OOM killer, as acc_size is up to 1/64 of memory size on 8GPU system, so it is safe to simplify and remove data structure acc_size.
You have a point, I didn't think of that. The fact that TTM isn't tracking the data structure sizes any more doesn't mean, we shouldn't account for it in our own system memory usage.
On 2022-05-17 19:11, Alex Sierra wrote:
TTM used to track the "acc_size" of all BOs internally. We needed to
keep track of it in our memory reservation to avoid TTM running out
of memory in its own accounting. However, that "acc_size" accounting
has since been removed from TTM. Therefore we don't really need to
track it any more.
acc_size is size of amdgpu_bo data structure plus size of pages array and dma_address array, it is needed for each BO, so should track as system_mem_needed. It can be removed from ttm_mem_needed as this is not allocated by TTM as GTT memory.
That said, do we actually have DMA address arrays for VRAM allocations?
Also, acc_size doesn't track the extra dmabuf BOs we create for DMA mappings on multiple GPUs. So I'm not sure how useful the acc_size tracking is at this point. The system memory limit is currently 15/16 of total memory. Maybe that leaves enough reserve for data structure sizes?
One nit-pick below. This patch is Reviewed-by: Philip Yang <philip.yang@xxxxxxx>
Regards,
Felix
Regards,
Philip
Signed-off-by: Alex Sierra<alex.sierra@xxxxxxx>
---
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 57 ++++++-------------
1 file changed, 16 insertions(+), 41 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index fada3b149361..e985cf9c7ec0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -108,17 +108,8 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t size)
* compromise that should work in most cases without reserving too
* much memory for page tables unnecessarily (factor 16K, >> 14).
*/
-#define ESTIMATE_PT_SIZE(mem_size) ((mem_size) >> 14)
-
-static size_t amdgpu_amdkfd_acc_size(uint64_t size)
-{
- size >>= PAGE_SHIFT;
- size *= sizeof(dma_addr_t) + sizeof(void *);
- return __roundup_pow_of_two(sizeof(struct amdgpu_bo)) +
- __roundup_pow_of_two(sizeof(struct ttm_tt)) +
- PAGE_ALIGN(size);
-}
+#define ESTIMATE_PT_SIZE(mem_size) ((mem_size) >> 14)
/**
* amdgpu_amdkfd_reserve_mem_limit() - Decrease available memory by size
Remove "including any reserved for control structures" from function description.
Regards,
Philip
@@ -136,28 +127,22 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
{
uint64_t reserved_for_pt =
ESTIMATE_PT_SIZE(amdgpu_amdkfd_total_mem_size);
- size_t acc_size, system_mem_needed, ttm_mem_needed, vram_needed;
+ size_t system_mem_needed, ttm_mem_needed, vram_needed;
int ret = 0;
- acc_size = amdgpu_amdkfd_acc_size(size);
-
+ system_mem_needed = 0;
+ ttm_mem_needed = 0;
vram_needed = 0;
if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_GTT) {
- system_mem_needed = acc_size + size;
- ttm_mem_needed = acc_size + size;
+ system_mem_needed = size;
+ ttm_mem_needed = size;
} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
- system_mem_needed = acc_size;
- ttm_mem_needed = acc_size;
vram_needed = size;
} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
- system_mem_needed = acc_size + size;
- ttm_mem_needed = acc_size;
- } else if (alloc_flag &
- (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
- KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
- system_mem_needed = acc_size;
- ttm_mem_needed = acc_size;
- } else {
+ system_mem_needed = size;
+ } else if (!(alloc_flag &
+ (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
+ KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP))) {
pr_err("%s: Invalid BO type %#x\n", __func__, alloc_flag);
return -ENOMEM;
}
@@ -193,28 +178,18 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
static void unreserve_mem_limit(struct amdgpu_device *adev,
uint64_t size, u32 alloc_flag)
{
- size_t acc_size;
-
- acc_size = amdgpu_amdkfd_acc_size(size);
-
spin_lock(&kfd_mem_limit.mem_limit_lock);
if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_GTT) {
- kfd_mem_limit.system_mem_used -= (acc_size + size);
- kfd_mem_limit.ttm_mem_used -= (acc_size + size);
+ kfd_mem_limit.system_mem_used -= size;
+ kfd_mem_limit.ttm_mem_used -= size;
} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
- kfd_mem_limit.system_mem_used -= acc_size;
- kfd_mem_limit.ttm_mem_used -= acc_size;
adev->kfd.vram_used -= size;
} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
- kfd_mem_limit.system_mem_used -= (acc_size + size);
- kfd_mem_limit.ttm_mem_used -= acc_size;
- } else if (alloc_flag &
- (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
- KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
- kfd_mem_limit.system_mem_used -= acc_size;
- kfd_mem_limit.ttm_mem_used -= acc_size;
- } else {
+ kfd_mem_limit.system_mem_used -= size;
+ } else if (!(alloc_flag &
+ (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
+ KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP))) {
pr_err("%s: Invalid BO type %#x\n", __func__, alloc_flag);
goto release;
}