Re: [PATCH 2/4] drm/amdkfd: Signal page table fence after KFD flush tlb

Philip Yang <yangp@xxxxxxx> · Tue, 6 Jun 2023 12:16:45 -0400

On 2023-06-05 11:47, Christian König wrote:
Am 05.06.23 um 17:40 schrieb Shashank Sharma:

On 05/06/2023 17:18, Christian König wrote:
Am 05.06.23 um 17:13 schrieb Shashank Sharma:

On 02/06/2023 16:54, Felix Kuehling wrote:
Am 2023-06-02 um 07:57 schrieb Christian König:
Am 01.06.23 um 21:31 schrieb Philip Yang:
To free page table BOs which are freed when updating page table, 
for
example PTE BOs when PDE0 used as PTE.

Signed-off-by: Philip Yang <Philip.Yang@xxxxxxx>
---
  drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 +++++
  1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index af0a4b5257cc..0ff007a74d03 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -2101,6 +2101,11 @@ void kfd_flush_tlb(struct 
kfd_process_device *pdd, enum TLB_FLUSH_TYPE type)
              amdgpu_amdkfd_flush_gpu_tlb_pasid(
                  dev->adev, pdd->process->pasid, type, xcc);
      }
+
+    /* Signal page table fence to free page table BOs */
+    dma_fence_signal(vm->pt_fence);

That's not something you can do here.

Signaling a fence can never depend on anything except for 
hardware work. In other words dma_fence_signal() is supposed to 
be called only from interrupt context!

We are signaling eviction fences from normal user context, too. 
There is no practical way to put this into an interrupt handler 
when the TLB flush is being done synchronously on a user thread. 
We have to do this in such a context for user mode queues.


We are currently working on adding a provide a high level kernel 
API which can be called directly to perform a TLB flush. Internally 
this API will add a deferred work to use the SDMA engine to perform 
a GPU TLB flush work (to compensate for a HW bug in Navi Chips). If 
my understanding is right, by interrupt context Christian means to 
perform this flush and signal from that differed work, is that so 
@Christian ?

Well more or less. Ideally you put the TLB flush in a work item (or 
use the SDMA for the hw bug workaround on Navi 1x).

The point is that you shouldn't have it in the same execution thread 
as the VM page table updates, because any memory allocation or 
grabbing a lock could potentially depend on the TLB flush as soon as 
you have published the dma_fence (by adding it to the VM page table 
BOs for example).

Would it work for everyone if we add this generic API (say 
amdgpu_flush_tlb_async()) to add a TLB flush work and will also send 
this dma_fence_signal from within ? KFD can simply consume it from 
wherever they want, do you see a race condition if we do like this ?

Yes, that's pretty much the whole idea. amdgpu_flush_tlb() should just 
return a dma_fence object.

This dma_fence object should either be the SDMA workaround or signaled 
from a work item.

We can then fence the BOs or just wait for the dma_fence object to 
signal.

In order to fix the original issue, page table BOs freed and reused 
before tlb is flushed, I will rebase and modify the patch series on new 
API amdgpu_flush_tlb_sync() which works for both KFD and gfx. Is my 
understanding correct?

Regards,

Philip


Regards,
Christian.



- Shashank

Christian.


- Shashank


Regards,
  Felix



What we can to is to put the TLB flushing into an irq worker or 
work item and let the signaling happen from there.

Amar and Shashank are already working on this, I strongly suggest 
to sync up with them.

Regards,
Christian.

+ dma_fence_put(vm->pt_fence);
+    vm->pt_fence = amdgpu_pt_fence_create();
  }
    struct kfd_process_device 
*kfd_process_device_data_by_id(struct kfd_process *p, uint32_t 
gpu_id)