[PATCH 6/9] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS

john@xxxxxxxxxxxxx (John Brooks) · Sun, 25 Jun 2017 01:57:19 +0000

On Sat, Jun 24, 2017 at 08:00:05PM +0200, Christian KÃ¶nig wrote:
> Am 23.06.2017 um 19:39 schrieb John Brooks:
> >When the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag is given by userspace,
> >it should only be treated as a hint to initially place a BO somewhere CPU
> >accessible, rather than having a permanent effect on BO placement.
> 
> And that is a clear NAK from my side.
> 
> CPU_ACCESS_REQUIRED is a permanent limitation to where the buffer should be
> placed.
> 
> Ignoring that and changing the IOCTL interface like this is clearly not
> acceptable.
> 
> Christian.
> 

In that case, I will remove this patch. But I will keep in some form the
safeguard against BOs oscillating between VRAM and GTT, as patch 3 allows page
faults to move BOs to GTT (more frequently than before, anyway). If userspace
has *not* marked said BOs with CPU_ACCESS_REQUIRED, amdgpu_cs_bo_validate will
move them to invisible VRAM, where they may promptly generate another page
fault. Cue framerate death. The same problem was seen last month in patch 2 of
the "Tweaks for high pressure on CPU visible VRAM" series.

John

> >
> >Instead of the flag being set in stone at BO creation, set the flag when a
> >page fault occurs so that it goes somewhere CPU-visible, and clear it when
> >the BO is requested by the GPU.
> >
> >However, clearing the CPU_ACCESS_REQUIRED flag may move a BO to invisible
> >VRAM, which is likely to cause a page fault that moves it right back to
> >GTT. When this happens too much, it is highly detrimental to performance.
> >Only clear the flag on CS if:
> >
> >- The BO wasn't page faulted for a certain amount of time (currently 10
> >   seconds, measured with jiffies), and
> >- its last page fault didn't occur too soon (currently 500ms) after its
> >   last CS request, or vice versa.
> >
> >Setting the flag in amdgpu_fault_reserve_notify() also means that we can
> >remove the loop to restrict lpfn to the end of visible VRAM, because
> >amdgpu_ttm_placement_init() will do it for us.
> >
> >Signed-off-by: John Brooks <john at fastquake.com>
> >---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     |  1 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 46 ++++++++++++++++++++++--------
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  1 +
> >  3 files changed, 36 insertions(+), 12 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >index 2fad8bd..73d6882 100644
> >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >@@ -320,6 +320,7 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p,
> >  	else
> >  		domain = bo->allowed_domains;
> >+	amdgpu_bo_clear_cpu_access_required(bo);
> >  retry:
> >  	amdgpu_ttm_placement_from_domain(bo, domain);
> >  	initial_bytes_moved = atomic64_read(&adev->num_bytes_moved);
> >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >index 31d1f21..a7d48a7 100644
> >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >@@ -967,8 +967,8 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
> >  {
> >  	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
> >  	struct amdgpu_bo *abo;
> >-	unsigned long offset, size, lpfn;
> >-	int i, r;
> >+	unsigned long offset, size;
> >+	int r;
> >  	if (!amdgpu_ttm_bo_is_amdgpu_bo(bo))
> >  		return 0;
> >@@ -991,18 +991,9 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
> >  	/* hurrah the memory is not visible ! */
> >  	atomic64_inc(&adev->num_vram_cpu_page_faults);
> >+	abo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
> >  	amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM |
> >  					 AMDGPU_GEM_DOMAIN_GTT);
> >-	lpfn =	adev->mc.visible_vram_size >> PAGE_SHIFT;
> >-	for (i = 0; i < abo->placement.num_placement; i++) {
> >-		/* Try to move the BO into visible VRAM */
> >-		if ((abo->placements[i].flags & TTM_PL_FLAG_VRAM) &&
> >-		    (!abo->placements[i].lpfn ||
> >-		     abo->placements[i].lpfn > lpfn))
> >-			abo->placements[i].lpfn = lpfn;
> >-	}
> >-	abo->placement.busy_placement = abo->placement.placement;
> >-	abo->placement.num_busy_placement = abo->placement.num_placement;
> >  	r = ttm_bo_validate(bo, &abo->placement, false, false);
> >  	if (unlikely(r != 0))
> >  		return r;
> >@@ -1057,3 +1048,34 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo)
> >  	return bo->tbo.offset;
> >  }
> >+
> >+/**
> >+ * amdgpu_bo_clear_cpu_access_required
> >+ * @bo: BO to update
> >+ *
> >+ * Clears CPU_ACCESS_REQUIRED flag if the BO hasn't had a page fault in a while
> >+ * and it didn't have a page fault too soon after the last time it was moved to
> >+ * VRAM.
> >+ *
> >+ * Caller should have bo reserved.
> >+ *
> >+ */
> >+void amdgpu_bo_clear_cpu_access_required(struct amdgpu_bo *bo)
> >+{
> >+	const unsigned int page_fault_timeout_ms = 10000;
> >+	const unsigned int min_period_ms = 500;
> >+	unsigned int ms_since_pf, period_ms;
> >+
> >+	ms_since_pf = jiffies_to_msecs(jiffies - bo->last_page_fault_jiffies);
> >+	period_ms = jiffies_to_msecs(abs(bo->last_page_fault_jiffies -
> >+					 bo->last_cs_move_jiffies));
> >+
> >+	/*
> >+	 * Try to avoid a revolving door between GTT and VRAM. Clearing the
> >+	 * flag may move this BO back to VRAM, so don't clear it if it's likely
> >+	 * to page fault and go right back to GTT.
> >+	 */
> >+	if ((!bo->last_page_fault_jiffies || !bo->last_cs_move_jiffies) ||
> >+	    (ms_since_pf > page_fault_timeout_ms && period_ms > min_period_ms))
> >+		bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
> >+}
> >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> >index 3824851..b0cb137 100644
> >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> >@@ -182,6 +182,7 @@ int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
> >  				  struct reservation_object *resv,
> >  				  struct dma_fence **fence,
> >  				  bool direct);
> >+void amdgpu_bo_clear_cpu_access_required(struct amdgpu_bo *bo);
> >  /*
> 
>