Am 31.05.22 um 17:22 schrieb Felix Kuehling:
Am 2022-05-31 um 04:34 schrieb Lang Yu:The kfd_bo_list is used to restore process BOs after evictions. As page tables could be destroyed during evictions, we should also update pinned BOs' page tables during restoring to make sure they are valid. So for pinned BOs, 1, Don't validate them, but update their page tables. 2, Don't add eviction fence for them. Signed-off-by: Lang Yu <Lang.Yu@xxxxxxx> --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 39 ++++++++++--------- 1 file changed, 20 insertions(+), 19 deletions(-)diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.cindex 34ba9e776521..67c4bf1944d2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c@@ -1953,9 +1953,6 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct amdgpu_device *adev,return -EINVAL; } - /* delete kgd_mem from kfd_bo_list to avoid re-validating - * this BO in BO's restoring after eviction. - */ mutex_lock(&mem->process_info->lock); ret = amdgpu_bo_reserve(bo, true);@@ -1978,7 +1975,6 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct amdgpu_device *adev,amdgpu_amdkfd_remove_eviction_fence( bo, mem->process_info->eviction_fence); - list_del_init(&mem->validate_list.head); if (size) *size = amdgpu_bo_size(bo);@@ -2481,24 +2477,26 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)uint32_t domain = mem->domain; struct kfd_mem_attachment *attachment; - total_size += amdgpu_bo_size(bo); + if (!bo->tbo.pin_count) {I think this special case is not necessary. Validating pinned BOs that are already valid should have very low overhead. So adding a special case to avoid that is not really worth it.
I would put this check into amdgpu_amdkfd_bo_validate(). Otherwise you get a nice error if a BO is pinned to VRAM and you try to validate it into GTT for example.
On the other hand that error might be exactly what you want in this case.Anyway, with or without this, feel free to add an Acked-by: Christian König <christian.koenig@xxxxxxx> to this patch.
Regards, Christian.
Other than that, this patch looks good to me. Regards, Felix+ total_size += amdgpu_bo_size(bo); - ret = amdgpu_amdkfd_bo_validate(bo, domain, false); - if (ret) { - pr_debug("Memory eviction: Validate BOs failed\n"); - failed_size += amdgpu_bo_size(bo); - ret = amdgpu_amdkfd_bo_validate(bo, - AMDGPU_GEM_DOMAIN_GTT, false); + ret = amdgpu_amdkfd_bo_validate(bo, domain, false); + if (ret) { + pr_debug("Memory eviction: Validate BOs failed\n"); + failed_size += amdgpu_bo_size(bo);+ ret = amdgpu_amdkfd_bo_validate(bo, AMDGPU_GEM_DOMAIN_GTT, false);+ if (ret) { + pr_debug("Memory eviction: Try again\n"); + goto validate_map_fail; + } + } + ret = amdgpu_sync_fence(&sync_obj, bo->tbo.moving); if (ret) { - pr_debug("Memory eviction: Try again\n");+ pr_debug("Memory eviction: Sync BO fence failed. Try again\n");goto validate_map_fail; } } - ret = amdgpu_sync_fence(&sync_obj, bo->tbo.moving); - if (ret) {- pr_debug("Memory eviction: Sync BO fence failed. Try again\n");- goto validate_map_fail; - } + list_for_each_entry(attachment, &mem->attachments, list) { if (!attachment->is_mapped) continue;@@ -2544,10 +2542,13 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)/* Attach new eviction fence to all BOs */ list_for_each_entry(mem, &process_info->kfd_bo_list, - validate_list.head) + validate_list.head) { + if (mem->bo->tbo.pin_count) + continue; + amdgpu_bo_fence(mem->bo, &process_info->eviction_fence->base, true); - + } /* Attach eviction fence to PD / PT BOs */ list_for_each_entry(peer_vm, &process_info->vm_list_head, vm_list_node) {