Re: [PATCH] drm/amdkfd: Set pte_flags for actual BO location

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2022-08-30 02:00, Christian König wrote:
Am 29.08.22 um 21:30 schrieb Felix Kuehling:

Am 2022-08-29 um 14:59 schrieb Christian König:
Am 29.08.22 um 18:07 schrieb Felix Kuehling:
Am 2022-08-29 um 11:38 schrieb Christian König:
Am 27.08.22 um 01:16 schrieb Felix Kuehling:
BOs can be in a different location than was intended at allocation time, for example when restoring fails after an eviction or BOs get pinned in system memory. On some GPUs the MTYPE for coherent mappings depends on
the actual memory location.

Use the actual location to determine the pte_flags every time the page
tables are updated.

For a workaround ok, but looks a bit awkward. Basically we need different MTYPE based on the location, right?

Yes. On Aldebaran and Arcturus we need different MTYPEs for fine-grained coherence depending on the location.

It would be much cleaner to have a better description how all this came to be in the mapping.

E.g. that we generate the flags in the VM code based on the requirements described in the mapping.

Do we have time to clean this up thoughtfully?

Currently we have two places in the KFD code that generates the mapping flags. I was planning to eliminate the duplication. I think you're suggesting moving it into the amdgpu_vm code instead. Unfortunately it's somewhat GPU-specific. So you probably won't like this code in the general amdgpu_vm code. Maybe the HW-specific part should be in gmc_v*.c.

We have the gmc_v*_get_vm_pte() and gmc_v*_get_vm_pde() functions exactly for that.


The requirements would include:

 * Memory type and mapping (local, system, PCIe P2P, XGMI P2P)
 * Memory model (coarse-grained or fine-grained coherence)

Question is if any of this is a per BO or per mapping information?

With XGMI, we are sharing the same BO between multiple GPUs. Local and remote mappings of the same BO can be different. I think a per-BO attribute to decide the memory model makes sense. Then the mapping code needs to figure out the flags from the per-BO memory model and the type of mapping on the fly. gmc_v*_get_vm_pte() seems to have all the parameters necessary to figure this out.



For the gfx side we unfortunately have put the MTYPE into the mapping (which turned out to be a bad idea).

So as far as I understand it the MTYPE should rather be obtained from per buffer flags and the current placement, correct?

Yes. I can work on a patch for that.

Regards,
  Felix



Regards,
Christian.


Regards,
  Felix



Regards,
Christian.


Regards,
  Felix



Christian.


Signed-off-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>
---
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  9 ++++++++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        | 19 +++++++++++++++++++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        |  1 +
  3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index cbd593f7d553..5dd89f5a032f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -405,6 +405,7 @@ static int vm_update_pds(struct amdgpu_vm *vm, struct amdgpu_sync *sync)   static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
  {
      struct amdgpu_device *bo_adev = amdgpu_ttm_adev(mem->bo->tbo.bdev);
+    bool is_vram = mem->bo->tbo.resource->mem_type == TTM_PL_VRAM;
      bool coherent = mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_COHERENT;       bool uncached = mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_UNCACHED;
      uint32_t mapping_flags;
@@ -420,7 +421,7 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
      switch (adev->asic_type) {
      case CHIP_ARCTURUS:
      case CHIP_ALDEBARAN:
-        if (mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
+        if (is_vram) {
              if (bo_adev == adev) {
                  if (uncached)
                      mapping_flags |= AMDGPU_VM_MTYPE_UC;
@@ -1236,12 +1237,18 @@ static int update_gpuvm_pte(struct kgd_mem *mem,
  {
      struct amdgpu_bo_va *bo_va = entry->bo_va;
      struct amdgpu_device *adev = entry->adev;
+    uint64_t pte_flags = get_pte_flags(adev, mem);
      int ret;
        ret = kfd_mem_dmamap_attachment(mem, entry);
      if (ret)
          return ret;
  +    if (unlikely(entry->pte_flags != pte_flags)) {
+        amdgpu_vm_bo_update_flags(bo_va, pte_flags);
+        entry->pte_flags = pte_flags;
+    }
+
      /* Update the page tables  */
      ret = amdgpu_vm_bo_update(adev, bo_va, false);
      if (ret) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 59cac347baa3..954a40d5d828 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1862,6 +1862,25 @@ void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev,
      }
  }
  +/**
+ * amdgpu_vm_bo_update_flags - Update mapping flags of invalid mappings
+ *
+ * @bo_va: identifies the BO and VM
+ * @flags: new mapping flags
+ *
+ * The update is only applied to invalid mappings. This allows updating the + * mapping flags after a migration to maintain the desired coherence. The next + * call to amdgpu_vm_bo_update() will apply the new @flags to the page table.
+ */
+void amdgpu_vm_bo_update_flags(struct amdgpu_bo_va *bo_va,
+                   uint64_t flags)
+{
+    struct amdgpu_bo_va_mapping *mapping;
+
+    list_for_each_entry(mapping, &bo_va->invalids, list)
+        mapping->flags = flags;
+}
+
  /**
   * amdgpu_vm_get_block_size - calculate VM page table size as power of two
   *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 9ecb7f663e19..11793716cd8b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -413,6 +413,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev,
  bool amdgpu_vm_evictable(struct amdgpu_bo *bo);
  void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev,
                   struct amdgpu_bo *bo, bool evicted);
+void amdgpu_vm_bo_update_flags(struct amdgpu_bo_va *bo_va, uint64_t flags);   uint64_t amdgpu_vm_map_gart(const dma_addr_t *pages_addr, uint64_t addr);
  struct amdgpu_bo_va *amdgpu_vm_bo_find(struct amdgpu_vm *vm,
                         struct amdgpu_bo *bo);






[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux