Re: [PATCH 1/5] drm/amdgpu: allow direct submission in the VM backends

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Tue, 16 Jul 2019 15:36:54 +0200

Am 02.07.19 um 21:35 schrieb Kuehling, Felix:
This assumes that page tables are resident when a page fault is handled.

Yeah, that is correct. I also haven't completely figured out how we can 
prevent the VM from being destroyed while handling the fault.

I mean it's perfectly possible that the process is killed while faults 
are still in the pipeline.

I think it's possible that a page table gets evicted while a page fault
is pending. Maybe not with graphics, but definitely with compute where
waves can be preempted while waiting for a page fault. In that case the
direct access would break.

Even with graphics I think it's still possible that new page tables need
to be allocated to handle a page fault. When that happens, you need to
wait for fences to let new page tables be validated and initialized.

Yeah, the problem here is that when you wait on fences which in turn 
depend on your submission your end up in a deadlock.

Patch #2 deals with updating page directories. That pretty much implies
that page tables have moved or been reallocated. Wouldn't that be a
counter-indication for using direct page table updates? In other words,
if you need to update page directories, a direct page table updates is
unsafe by definition.

No, page tables are allocated because this is for handling invalid faults.

E.g. we get a fault for an address where nothing is mapped and just want 
to silence it.

I will try to implement something to at least avoid accessing a 
destructed VM.

Regards,
Christian.


Regards,
    Felix

On 2019-06-28 8:18 a.m., Christian König wrote:
This allows us to update page tables directly while in a page fault.

Signed-off-by: Christian König<christian.koenig@xxxxxxx>
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |  5 ++++
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c  |  4 +++
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 29 +++++++++++++--------
   3 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 489a162ca620..5941accea061 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -197,6 +197,11 @@ struct amdgpu_vm_update_params {
   	 */
   	struct amdgpu_vm *vm;
   
+	/**
+	 * @direct: if changes should be made directly
+	 */
+	bool direct;
+
   	/**
   	 * @pages_addr:
   	 *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
index 5222d165abfc..f94e4896079c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
@@ -49,6 +49,10 @@ static int amdgpu_vm_cpu_prepare(struct amdgpu_vm_update_params *p, void *owner,
   {
   	int r;
   
+	/* Don't wait for anything during page fault */
+	if (p->direct)
+		return 0;
+
   	/* Wait for PT BOs to be idle. PTs share the same resv. object
   	 * as the root PD BO
   	 */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
index ddd181f5ed37..891d597063cb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
@@ -68,17 +68,17 @@ static int amdgpu_vm_sdma_prepare(struct amdgpu_vm_update_params *p,
   	if (r)
   		return r;
   
-	r = amdgpu_sync_fence(p->adev, &p->job->sync, exclusive, false);
-	if (r)
-		return r;
+	p->num_dw_left = ndw;
+
+	if (p->direct)
+		return 0;
   
-	r = amdgpu_sync_resv(p->adev, &p->job->sync, root->tbo.resv,
-			     owner, false);
+	r = amdgpu_sync_fence(p->adev, &p->job->sync, exclusive, false);
   	if (r)
   		return r;
   
-	p->num_dw_left = ndw;
-	return 0;
+	return amdgpu_sync_resv(p->adev, &p->job->sync, root->tbo.resv,
+				owner, false);
   }
   
   /**
@@ -99,13 +99,21 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
   	struct dma_fence *f;
   	int r;
   
-	ring = container_of(p->vm->entity.rq->sched, struct amdgpu_ring, sched);
+	if (p->direct)
+		ring = p->adev->vm_manager.page_fault;
+	else
+		ring = container_of(p->vm->entity.rq->sched,
+				    struct amdgpu_ring, sched);
   
   	WARN_ON(ib->length_dw == 0);
   	amdgpu_ring_pad_ib(ring, ib);
   	WARN_ON(ib->length_dw > p->num_dw_left);
-	r = amdgpu_job_submit(p->job, &p->vm->entity,
-			      AMDGPU_FENCE_OWNER_VM, &f);
+
+	if (p->direct)
+		r = amdgpu_job_submit_direct(p->job, ring, &f);
+	else
+		r = amdgpu_job_submit(p->job, &p->vm->entity,
+				      AMDGPU_FENCE_OWNER_VM, &f);
   	if (r)
   		goto error;
   
@@ -120,7 +128,6 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
   	return r;
   }
   
-
   /**
    * amdgpu_vm_sdma_copy_ptes - copy the PTEs from mapping
    *

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx