Am 2021-05-26 um 3:21 p.m. schrieb Eric Huang: > > On 2021-05-25 3:16 p.m., Felix Kuehling wrote: >> Similar to a recent fix by Philip Yang 76e08b37d0aa ("drm/amdgpu: flush >> TLB if valid PDE turns into PTE"), there needs to be a conditional TLB >> flush after map, if any PDEs were unmapped and turned into PTEs in the >> process. This is currently returned by amdgpu_vm_bo_update_mapping in >> the "table_freed" parameter. This needs to be also returned by >> amdgpu_vm_bo_update and reported back to KFD, so KFD can do the TLB >> flush after map, if needed. > I follow up your suggestion to create another patch (attached) and > test it. It seems it doesn't improve the latency when memory size is > bigger than huge page (2M), because table_freed parameter will always > be true when mapping page is huge page size. I think Philip's patch is > to fix the case of remapping memory from small page to huge page in > HMM, but it doesn't consider if the memory is remapped and arbitrarily > flushes TLBs when mapping huge page. That's unexpected. Turning an invalid PDE into a valid (huge) PTE should not trigger a TLB flush. Regards, Felix >> kfd_flush_tlb probably needs a new parameter to determine the flush >> type. The flush after map can be a "legacy" flush (type 0). The flush >> after unmap must be a "heavy-weight" flush (type 2) to make sure we >> don't evict cache lines into pages that we no longer own. >> >> Finally, in the ticket I thought about possible optimizations using a >> worker to minimize the impact of TLB flushes on unmap latency. That >> could be a follow up commit. > It is a good idea to use worker, but how do we grantee it done before > memory is remapped? if remapping depends on it, then more latency will > be introduced in map. > > Regards, > Eric >> Regards, >> Felix >> >> >> Am 2021-05-25 um 1:53 p.m. schrieb Eric Huang: >>> It it to optimize memory allocation latency. >>> >>> Signed-off-by: Eric Huang <jinhuieric.huang@xxxxxxx> >>> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c >>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c >>> index 960913a35ee4..ab73741edb97 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c >>> @@ -1657,20 +1657,6 @@ static int kfd_ioctl_map_memory_to_gpu(struct >>> file *filep, >>> goto sync_memory_failed; >>> } >>> >>> - /* Flush TLBs after waiting for the page table updates to >>> complete */ >>> - for (i = 0; i < args->n_devices; i++) { >>> - peer = kfd_device_by_id(devices_arr[i]); >>> - if (WARN_ON_ONCE(!peer)) >>> - continue; >>> - peer_pdd = kfd_get_process_device_data(peer, p); >>> - if (WARN_ON_ONCE(!peer_pdd)) >>> - continue; >>> - if (!amdgpu_read_lock(peer->ddev, true)) { >>> - kfd_flush_tlb(peer_pdd); >>> - amdgpu_read_unlock(peer->ddev); >>> - } >>> - } >>> - >>> kfree(devices_arr); >>> >>> trace_kfd_map_memory_to_gpu_end(p, >>> @@ -1766,6 +1752,7 @@ static int >>> kfd_ioctl_unmap_memory_from_gpu(struct file *filep, >>> amdgpu_read_unlock(peer->ddev); >>> goto unmap_memory_from_gpu_failed; >>> } >>> + kfd_flush_tlb(peer_pdd); >>> amdgpu_read_unlock(peer->ddev); >>> args->n_success = i+1; >>> } >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx