Re: [PATCH] drm/amdkfd: Fix partial migrate issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 1/7/2025 2:44 PM, Philip Yang wrote:


On 2025-01-07 10:50, Chen, Xiaogang wrote:


On 1/6/2025 8:02 PM, Deng, Emily wrote:

[AMD Official Use Only - AMD Internal Distribution Only]


 

 

From: Chen, Xiaogang <Xiaogang.Chen@xxxxxxx>
Sent: Monday, January 6, 2025 11:27 PM
To: Deng, Emily <Emily.Deng@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [PATCH] drm/amdkfd: Fix partial migrate issue

 

 

On 1/2/2025 6:06 PM, Emily Deng wrote:

For partial migrate from ram to vram, the migrate->cpages is not
equal to migrate->npages, should use migrate->npages to check all needed
migrate pages which could be copied or not.
 
And only need to set those pages could be migrated to migrate->dst[i], or
the migrate_vma_pages will migrate the wrong pages based on the migrate->dst[i].
 
Signed-off-by: Emily Deng <Emily.Deng@xxxxxxx>
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 4b275937d05e..5c96c2d425e3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -278,7 +278,7 @@ svm_migrate_copy_to_vram(struct kfd_node *node, struct svm_range *prange,
                         struct migrate_vma *migrate, struct dma_fence **mfence,
                         dma_addr_t *scratch, uint64_t ttm_res_offset)
 {
-       uint64_t npages = migrate->cpages;
+       uint64_t npages = migrate->npages;

I agree this part.

 
        struct amdgpu_device *adev = node->adev;
        struct device *dev = adev->dev;
        struct amdgpu_res_cursor cursor;
@@ -299,9 +299,6 @@ svm_migrate_copy_to_vram(struct kfd_node *node, struct svm_range *prange,
                struct page *spage;
 
                dst[i] = cursor.start + (j << PAGE_SHIFT);
-               migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]);
-        svm_migrate_get_vram_page(prange, migrate->dst[i]);
-               migrate->dst[i] = migrate_pfn(migrate->dst[i]);
 
                spage = migrate_pfn_to_page(migrate->src[i]);
                if (spage && !is_zone_device_page(spage)) {
@@ -345,6 +342,9 @@ svm_migrate_copy_to_vram(struct kfd_node *node, struct svm_range *prange,
                } else {
                        j++;
                }
+               migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]);
+        svm_migrate_get_vram_page(prange, migrate->dst[i]);
+               migrate->dst[i] = migrate_pfn(migrate->dst[i]);

I think what current code misses here is that migrate->dst[i] should match migrate->src[i]: migrate->dst[i](vram page) got set for page that will be migrated from system ram, otherwise migrate->dst[i] should be zero. Your change makes migrates->dst[i] not set though its page has been migrated by svm_migrate_copy_memory_gart for the case that cpages != npages, because you set migrate->dst[i] at end of loop and use 'continue' after migration.

The page migration happens at svm_migrate_copy_memory_gart, not migrate_vma_pages that migrates struct page meta-data from source struct page to destination struct page and has mmu notification.

Regards

Xiaogang

The migrate->dst[i] default value is 0. The purpose of the change is to match the migrate->dst[i] and migrate->src[i]. And for svm_migrate_copy_to_vram, it only will call svm_migrate_copy_memory_gart to copy those pages which need to be migrated, for those don’t need migrate pages, it won’t call svm_migrate_copy_memory_gart.

Yes, the issue here is we need match migrate->dst[i] and migrate->src[i]: migrate->dst[i] need be set only when its correspondent migrate->src[i] page will be migrated or the src page has dma address setup by dma_map_page. Then why not set migrate->dst[i] at same time when have dma_map_page for src page?

yes, agree, I will reply the v2 patch with this change.

That way will simplify the logic and not need use amdgpu_res_next to jump vram cursor.

if spage is not contiguous or dst is the last page of cursor (vram address is not contiguous), we need setup sdma copy and then call amdgpu_res_next to update cursor->start.

I think we only need use amdgpu_res_nex at one place where update migrate->dst[i] together with dma_map_page for src page, but need to remember the jump between two dma_map_page to adapt to cursor->start update.

Regards

Xiaogang

Regards,

Philip

Regards

Xiaogang

 
        }
 
        r = svm_migrate_copy_memory_gart(adev, src + i - j, dst + i - j, j,

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux