Re: [PATCH] drm/amdkfd: Use partial migrations in GPU page faults

"Chen, Xiaogang" <xiaogang.chen@xxxxxxx> · Tue, 5 Sep 2023 10:16:34 -0500

On 9/5/2023 9:02 AM, Philip Yang wrote:

On 2023-08-31 17:29, Chen, Xiaogang wrote:

On 8/31/2023 3:59 PM, Felix Kuehling wrote:
On 2023-08-31 16:33, Chen, Xiaogang wrote:
That said, I'm not actually sure why we're freeing the DMA 
address array after migration to RAM at all. I think we still 
need it even when we're using VRAM. We call svm_range_dma_map 
in svm_range_validate_and_map regardless of whether the range 
is in VRAM or system memory. So it will just allocate a new 
array the next time the range is validated anyway. VRAM pages 
use a special address encoding to indicate VRAM pages to the 
GPUVM code.

I think we do not need free DMA address array as you said, it 
is another thing though.

We need unmap dma address(dma_unmap_page) after migrate from 
ram to vram because we always do dma_map_page at 
svm_range_validate_and_map. If not we would have multiple dma 
maps for same sys ram page.

svm_range_dma_map_dev calls dma_unmap_page before overwriting an 
existing valid entry in the dma_addr array. Anyway, dma 
unmapping the old pages in bulk may still be cleaner. And it 
avoids delays in cleaning up DMA mappings after migrations.

Regards,
  Felix

then we may not need do dma_unmap after migrate from ram to vram 
since svm_range_dma_map_dev always do dma_unmap_page if the 
address is valid dma address for sys ram, and after migrate from 
ram to vram we always do gpu mapping?

I think with XNACK enabled, the DMA mapping may be delayed until a 
page fault. For example on a multi-GPU system, GPU1 page faults 
and migrates data from system memory to its VRAM. Immediately 
afterwards, the page fault handler should use svm_validate_and_map 
to update GPU1 page tables. But GPU2 page tables are not updated 
immediately. So the now stale DMA mappings for GPU2 would continue 
to exist until the next page fault on GPU2.

Regards,
  Felix

If I understand correctly: when user call svm_range_set_attr, if 
p->xnack_enabled is true, we can skip call 
svm_range_validate_and_map. We postpone the buffer validating and 
gpu mapping until page fault or the time the buffer really got used 
by a GPU, and only dma map and gpu map for this GPU.

The current implementation of svm_range_set_attr skips the 
validation after migration if XNACK is off, because it is handled by 
svm_range_restore_work that gets scheduled by the MMU notifier 
triggered by the migration.

With XNACK on, svm_range_set_attr currently validates and maps after 
migration assuming that the data will be used by the GPU(s) soon. 
That is something we could change and let page faults take care of 
the mappings as needed.

Yes, with xnack on, my understanding is we can skip 
svm_range_validate_and_map at svm_range_set_attr after migration, 
then page fault handler will do dma and gpu mapping. That would save 
the first time dma and gpu mapping which apply to all GPUs that user 
ask for access. Then current gpu page fault handler just does dma and 
gpu mapping for the GPU that triggered page fault. Is that ok?

With xnack on, after prefetch the range to GPU, need 
svm_range_validate_and_map to update the mapping of the GPU migrated 
to (also the mapping of GPUs with access_in_place), because app 
prefetch to GPU to avoid GPU page fault.

With xnack on postpone gpu mapping to page fault handler may save some 
operations since we update mapping only on gpu that need the mapping, 
but that is not for this patch any way.

After migrating to VRAM, we only need dma_unmap_page from 
prange->dma_addr array, don't need to free the dma_addr array itself, 
as it can be reused to store VRAM address to map to GPU.

yes, we do not need free dma array, only need dma_unmpa_page at 
svm_range_free_dma_mappings. The array stores both system ram dma 
address and vram physical address. We can free this dma array at 
svm_range_free.

Regards

Xiaogang

Regards,

Philip

Regards

Xiaogang

Regards,
  Felix

Regards

Xiaogang