Re: [PATCH v9] drm/i915: Support to enable TRTT on GEN9

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 3/24/2016 9:59 PM, Gore, Tim wrote:

Tim Gore
Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ

-----Original Message-----
From: Intel-gfx [mailto:intel-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx] On Behalf
Of akash.goel@xxxxxxxxx
Sent: Tuesday, March 22, 2016 8:43 AM
To: intel-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Goel, Akash
Subject:  [PATCH v9] drm/i915: Support to enable TRTT on GEN9

From: Akash Goel <akash.goel@xxxxxxxxx>

Gen9 has an additional address translation hardware support in form of Tiled
Resource Translation Table (TR-TT) which provides an extra level of
abstraction over PPGTT.
This is useful for mapping Sparse/Tiled texture resources.
Sparse resources are created as virtual-only allocations. Regions of the
resource that the application intends to use is bound to the physical memory
on the fly and can be re-bound to different memory allocations over the
lifetime of the resource.

TR-TT is tightly coupled with PPGTT, a new instance of TR-TT will be required
for a new PPGTT instance, but TR-TT may not enabled for every context.
1/16th of the 48bit PPGTT space is earmarked for the translation by TR-TT,
which such chunk to use is conveyed to HW through a register.
Any GFX address, which lies in that reserved 44 bit range will be translated
through TR-TT first and then through PPGTT to get the actual physical
address, so the output of translation from TR-TT will be a PPGTT offset.

TRTT is constructed as a 3 level tile Table. Each tile is 64KB is size which leaves
behind 44-16=28 address bits. 28bits are partitioned as 9+9+10, and each
level is contained within a 4KB page hence L3 and L2 is composed of
512 64b entries and L1 is composed of 1024 32b entries.

There is a provision to keep TR-TT Tables in virtual space, where the pages of
TRTT tables will be mapped to PPGTT.
Currently this is the supported mode, in this mode UMD will have a full
control on TR-TT management, with bare minimum support from KMD.
So the entries of L3 table will contain the PPGTT offset of L2 Table pages,
similarly entries of L2 table will contain the PPGTT offset of L1 Table pages.
The entries of L1 table will contain the PPGTT offset of BOs actually backing
the Sparse resources.
UMD will have to allocate the L3/L2/L1 table pages as a regular BO only &
assign them a PPGTT address through the Soft Pin API (for example, use soft
pin to assign l3_table_address to the L3 table BO, when used).
UMD will also program the entries in the TR-TT page tables using regular
batch commands (MI_STORE_DATA_IMM), or via mmapping of the page
table BOs.
UMD may do the complete PPGTT address space management, on the
pretext that it could help minimize the conflicts.

Any space in TR-TT segment not bound to any Sparse texture, will be handled
through Invalid tile, User is expected to initialize the entries of a new
L3/L2/L1 table page with the Invalid tile pattern. The entries corresponding to
the holes in the Sparse texture resource will be set with the Null tile pattern
The improper programming of TRTT should only lead to a recoverable GPU
hang, eventually leading to banning of the culprit context without victimizing
others.

The association of any Sparse resource with the BOs will be known only to
UMD, and only the Sparse resources shall be assigned an offset from the TR-
TT segment by UMD. The use of TR-TT segment or mapping of Sparse
resources will be transparent to the KMD, UMD will do the address
assignment from TR-TT segment autonomously and KMD will be oblivious of
it.
Any object must not be assigned an address from TR-TT segment, they will
be mapped to PPGTT in a regular way by KMD.

This patch provides an interface through which UMD can convey KMD to
enable TR-TT for a given context. A new I915_CONTEXT_PARAM_TRTT param
has been added to I915_GEM_CONTEXT_SETPARAM ioctl for that purpose.
UMD will have to pass the GFX address of L3 table page, start location of TR-
TT segment alongwith the pattern value for the Null & invalid Tile registers.

v2:
  - Support context_getparam for TRTT also and dispense with a separate
    GETPARAM case for TRTT (Chris).
  - Use i915_dbg to log errors for the invalid TRTT ABI parameters passed
    from user space (Chris).
  - Move all the argument checking for TRTT in context_setparam to the
    set_trtt function (Chris).
  - Change the type of 'flags' field inside 'intel_context' to unsigned (Chris)
  - Rename certain functions to rightly reflect their purpose, rename
    the new param for TRTT in gem_context_param to
I915_CONTEXT_PARAM_TRTT,
    rephrase few lines in the commit message body, add more comments
(Chris).
  - Extend ABI to allow User specify TRTT segment location also.
  - Fix for selective enabling of TRTT on per context basis, explicitly
    disable TR-TT at the start of a new context.

v3:
  - Check the return value of gen9_emit_trtt_regs (Chris)
  - Update the kernel doc for intel_context structure.
  - Rebased.

v4:
  - Fix the warnings reported by 'checkpatch.pl --strict' (Michel)
  - Fix the context_getparam implementation avoiding the reset of size field,
    affecting the TRTT case.

v5:
  - Update the TR-TT params right away in context_setparam, by constructing
    & submitting a request emitting LRIs, instead of deferring it and
    conflating with the next batch submission (Chris)
  - Follow the struct_mutex handling related prescribed rules, while accessing
    User space buffer, both in context_setparam & getparam functions (Chris).

v6:
  - Fix the warning caused due to removal of un-allocated trtt vma node.

v7:
  - Move context ref/unref to context_setparam_ioctl from set_trtt() &
remove
    that from get_trtt() as not really needed there (Chris).
  - Add a check for improper values for Null & Invalid Tiles.
  - Remove superfluous DRM_ERROR from trtt_context_allocate_vma (Chris).
  - Rebased.

v8:
  - Add context ref/unref to context_getparam_ioctl also so as to be
consistent
    and ease the extension of ioctl in future (Chris)

v9:
  - Fix the handling of return value from trtt_context_allocate_vma() function,
    causing kernel panic at the time of destroying context, in case of
    unsuccessful allocation of trtt vma.
  - Rebased.

Testcase: igt/gem_trtt

Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Michel Thierry <michel.thierry@xxxxxxxxx>
Signed-off-by: Akash Goel <akash.goel@xxxxxxxxx>
Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
---
  drivers/gpu/drm/i915/i915_drv.h         |  16 +++-
  drivers/gpu/drm/i915/i915_gem_context.c | 157
+++++++++++++++++++++++++++++++-
  drivers/gpu/drm/i915/i915_gem_gtt.c     |  65 +++++++++++++
  drivers/gpu/drm/i915/i915_gem_gtt.h     |   8 ++
  drivers/gpu/drm/i915/i915_reg.h         |  19 ++++
  drivers/gpu/drm/i915/intel_lrc.c        | 124 ++++++++++++++++++++++++-
  drivers/gpu/drm/i915/intel_lrc.h        |   1 +
  include/uapi/drm/i915_drm.h             |   8 ++
  8 files changed, 393 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h
b/drivers/gpu/drm/i915/i915_drv.h index ecbd418..272d1f8 100644


@@ -2657,6 +2669,8 @@ struct drm_i915_cmd_table {
  				 !IS_VALLEYVIEW(dev) &&
!IS_CHERRYVIEW(dev) && \
  				 !IS_BROXTON(dev))

+#define HAS_TRTT(dev)		(IS_GEN9(dev))
+

A very minor point, but there is a w/a to disable TRTT for BXT_REVID_A0/1. I realise this
is basically obsolete now, but I'm still using one!

Thanks for raising this.
Michel & Thomas also apprised me about a similar WA for KBL.
Was thinking to submit that as a follow up patch.

Best regards
Akash
  	return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0715bb7..cbf8a03 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2169,6 +2169,17 @@ int i915_ppgtt_init_hw(struct drm_device *dev)  {
  	gtt_write_workarounds(dev);

+	if (HAS_TRTT(dev) && USES_FULL_48BIT_PPGTT(dev)) {
+		struct drm_i915_private *dev_priv = dev->dev_private;
+		/*
+		 * Globally enable TR-TT support in Hw.
+		 * Still TR-TT enabling on per context basis is required.
+		 * Non-trtt contexts are not affected by this setting.
+		 */
+		I915_WRITE(GEN9_TR_CHICKEN_BIT_VECTOR,
+			   GEN9_TRTT_BYPASS_DISABLE);
+	}
+
  	/* In the case of execlists, PPGTT is enabled by the context
descriptor
  	 * and the PDPs are contained within the context itself.  We don't
  	 * need to do anything here. */
@@ -3362,6 +3373,60 @@
i915_gem_obj_lookup_or_create_ggtt_vma(struct drm_i915_gem_object
*obj,

  }

+void intel_trtt_context_destroy_vma(struct i915_vma *vma) {
+	struct i915_address_space *vm = vma->vm;
+
+	WARN_ON(!list_empty(&vma->obj_link));
+	WARN_ON(!list_empty(&vma->vm_link));
+	WARN_ON(!list_empty(&vma->exec_list));
+
+	WARN_ON(!vma->pin_count);
+
+	if (drm_mm_node_allocated(&vma->node))
+		drm_mm_remove_node(&vma->node);
+
+	i915_ppgtt_put(i915_vm_to_ppgtt(vm));
+	kmem_cache_free(to_i915(vm->dev)->vmas, vma); }
+
+struct i915_vma *
+intel_trtt_context_allocate_vma(struct i915_address_space *vm,
+				uint64_t segment_base_addr)
+{
+	struct i915_vma *vma;
+	int ret;
+
+	vma = kmem_cache_zalloc(to_i915(vm->dev)->vmas, GFP_KERNEL);
+	if (!vma)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&vma->obj_link);
+	INIT_LIST_HEAD(&vma->vm_link);
+	INIT_LIST_HEAD(&vma->exec_list);
+	vma->vm = vm;
+	i915_ppgtt_get(i915_vm_to_ppgtt(vm));
+
+	/* Mark the vma as permanently pinned */
+	vma->pin_count = 1;
+
+	/* Reserve from the 48 bit PPGTT space */
+	vma->node.start = segment_base_addr;
+	vma->node.size = GEN9_TRTT_SEGMENT_SIZE;
+	ret = drm_mm_reserve_node(&vm->mm, &vma->node);
+	if (ret) {
+		ret = i915_gem_evict_for_vma(vma);
+		if (ret == 0)
+			ret = drm_mm_reserve_node(&vm->mm, &vma-
node);
+	}
+	if (ret) {
+		intel_trtt_context_destroy_vma(vma);
+		return ERR_PTR(ret);
+	}
+
+	return vma;
+}
+
  static struct scatterlist *
  rotate_pages(const dma_addr_t *in, unsigned int offset,
  	     unsigned int width, unsigned int height, diff --git
a/drivers/gpu/drm/i915/i915_gem_gtt.h
b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d804be0..8cbaca2 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -128,6 +128,10 @@ typedef uint64_t gen8_ppgtt_pml4e_t;
  #define GEN8_PPAT_ELLC_OVERRIDE		(0<<2)
  #define GEN8_PPAT(i, x)			((uint64_t) (x) << ((i) * 8))

+/* Fixed size segment */
+#define GEN9_TRTT_SEG_SIZE_SHIFT	44
+#define GEN9_TRTT_SEGMENT_SIZE		(1ULL <<
GEN9_TRTT_SEG_SIZE_SHIFT)
+
  enum i915_ggtt_view_type {
  	I915_GGTT_VIEW_NORMAL = 0,
  	I915_GGTT_VIEW_ROTATED,
@@ -560,4 +564,8 @@ size_t
  i915_ggtt_view_size(struct drm_i915_gem_object *obj,
  		    const struct i915_ggtt_view *view);

+struct i915_vma *
+intel_trtt_context_allocate_vma(struct i915_address_space *vm,
+				uint64_t segment_base_addr);
+void intel_trtt_context_destroy_vma(struct i915_vma *vma);
  #endif
diff --git a/drivers/gpu/drm/i915/i915_reg.h
b/drivers/gpu/drm/i915/i915_reg.h index 264885f..07936b6 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -188,6 +188,25 @@ static inline bool i915_mmio_reg_valid(i915_reg_t
reg)
  #define   GEN8_RPCS_EU_MIN_SHIFT	0
  #define   GEN8_RPCS_EU_MIN_MASK		(0xf <<
GEN8_RPCS_EU_MIN_SHIFT)

+#define GEN9_TR_CHICKEN_BIT_VECTOR	_MMIO(0x4DFC)
+#define   GEN9_TRTT_BYPASS_DISABLE	(1 << 0)
+
+/* TRTT registers in the H/W Context */
+#define GEN9_TRTT_L3_POINTER_DW0	_MMIO(0x4DE0)
+#define GEN9_TRTT_L3_POINTER_DW1	_MMIO(0x4DE4)
+#define   GEN9_TRTT_L3_GFXADDR_MASK	0xFFFFFFFF0000
+
+#define GEN9_TRTT_NULL_TILE_REG		_MMIO(0x4DE8)
+#define GEN9_TRTT_INVD_TILE_REG		_MMIO(0x4DEC)
+
+#define GEN9_TRTT_VA_MASKDATA		_MMIO(0x4DF0)
+#define   GEN9_TRVA_MASK_VALUE		0xF0
+#define   GEN9_TRVA_DATA_MASK		0xF
+
+#define GEN9_TRTT_TABLE_CONTROL		_MMIO(0x4DF4)
+#define   GEN9_TRTT_IN_GFX_VA_SPACE	(1 << 1)
+#define   GEN9_TRTT_ENABLE		(1 << 0)
+
  #define GAM_ECOCHK			_MMIO(0x4090)
  #define   BDW_DISABLE_HDC_INVALIDATION	(1<<25)
  #define   ECOCHK_SNB_BIT		(1<<10)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c
b/drivers/gpu/drm/i915/intel_lrc.c
index 3a23b95..8af480b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1645,6 +1645,76 @@ static int gen9_init_render_ring(struct
intel_engine_cs *engine)
  	return init_workarounds_ring(engine);
  }

+static int gen9_init_rcs_context_trtt(struct drm_i915_gem_request *req)
+{
+	struct intel_ringbuffer *ringbuf = req->ringbuf;
+	int ret;
+
+	ret = intel_logical_ring_begin(req, 2 + 2);
+	if (ret)
+		return ret;
+
+	intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1));
+
+	intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_TABLE_CONTROL);
+	intel_logical_ring_emit(ringbuf, 0);
+
+	intel_logical_ring_emit(ringbuf, MI_NOOP);
+	intel_logical_ring_advance(ringbuf);
+
+	return 0;
+}
+
+static int gen9_emit_trtt_regs(struct drm_i915_gem_request *req) {
+	struct intel_context *ctx = req->ctx;
+	struct intel_ringbuffer *ringbuf = req->ringbuf;
+	u64 masked_l3_gfx_address =
+		ctx->trtt_info.l3_table_address &
GEN9_TRTT_L3_GFXADDR_MASK;
+	u32 trva_data_value =
+		(ctx->trtt_info.segment_base_addr >>
GEN9_TRTT_SEG_SIZE_SHIFT) &
+		GEN9_TRVA_DATA_MASK;
+	const int num_lri_cmds = 6;
+	int ret;
+
+	/*
+	 * Emitting LRIs to update the TRTT registers is most reliable, instead
+	 * of directly updating the context image, as this will ensure that
+	 * update happens in a serialized manner for the context and also
+	 * lite-restore scenario will get handled.
+	 */
+	ret = intel_logical_ring_begin(req, num_lri_cmds * 2 + 2);
+	if (ret)
+		return ret;
+
+	intel_logical_ring_emit(ringbuf,
MI_LOAD_REGISTER_IMM(num_lri_cmds));
+
+	intel_logical_ring_emit_reg(ringbuf,
GEN9_TRTT_L3_POINTER_DW0);
+	intel_logical_ring_emit(ringbuf,
+lower_32_bits(masked_l3_gfx_address));
+
+	intel_logical_ring_emit_reg(ringbuf,
GEN9_TRTT_L3_POINTER_DW1);
+	intel_logical_ring_emit(ringbuf,
+upper_32_bits(masked_l3_gfx_address));
+
+	intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_NULL_TILE_REG);
+	intel_logical_ring_emit(ringbuf, ctx->trtt_info.null_tile_val);
+
+	intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_INVD_TILE_REG);
+	intel_logical_ring_emit(ringbuf, ctx->trtt_info.invd_tile_val);
+
+	intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_VA_MASKDATA);
+	intel_logical_ring_emit(ringbuf,
+				GEN9_TRVA_MASK_VALUE |
trva_data_value);
+
+	intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_TABLE_CONTROL);
+	intel_logical_ring_emit(ringbuf,
+				GEN9_TRTT_IN_GFX_VA_SPACE |
GEN9_TRTT_ENABLE);
+
+	intel_logical_ring_emit(ringbuf, MI_NOOP);
+	intel_logical_ring_advance(ringbuf);
+
+	return 0;
+}
+
  static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req)
{
  	struct i915_hw_ppgtt *ppgtt = req->ctx->ppgtt; @@ -2003,6
+2073,25 @@ static int gen8_init_rcs_context(struct drm_i915_gem_request
*req)
  	return intel_lr_context_render_state_init(req);
  }

+static int gen9_init_rcs_context(struct drm_i915_gem_request *req) {
+	int ret;
+
+	/*
+	 * Explictily disable TR-TT at the start of a new context.
+	 * Otherwise on switching from a TR-TT context to a new Non TR-TT
+	 * context the TR-TT settings of the outgoing context could get
+	 * spilled on to the new incoming context as only the Ring Context
+	 * part is loaded on the first submission of a new context, due to
+	 * the setting of ENGINE_CTX_RESTORE_INHIBIT bit.
+	 */
+	ret = gen9_init_rcs_context_trtt(req);
+	if (ret)
+		return ret;
+
+	return gen8_init_rcs_context(req);
+}
+
  /**
   * intel_logical_ring_cleanup() - deallocate the Engine Command Streamer
   *
@@ -2134,11 +2223,14 @@ static int logical_render_ring_init(struct
drm_device *dev)
  	logical_ring_default_vfuncs(dev, engine);

  	/* Override some for render ring. */
-	if (INTEL_INFO(dev)->gen >= 9)
+	if (INTEL_INFO(dev)->gen >= 9) {
  		engine->init_hw = gen9_init_render_ring;
-	else
+		engine->init_context = gen9_init_rcs_context;
+	} else {
  		engine->init_hw = gen8_init_render_ring;
-	engine->init_context = gen8_init_rcs_context;
+		engine->init_context = gen8_init_rcs_context;
+	}
+
  	engine->cleanup = intel_fini_pipe_control;
  	engine->emit_flush = gen8_emit_flush_render;
  	engine->emit_request = gen8_emit_request_render; @@ -2702,3
+2794,29 @@ void intel_lr_context_reset(struct drm_device *dev,
  		ringbuf->tail = 0;
  	}
  }
+
+int intel_lr_rcs_context_setup_trtt(struct intel_context *ctx) {
+	struct intel_engine_cs *engine = &(ctx->i915->engine[RCS]);
+	struct drm_i915_gem_request *req;
+	int ret;
+
+	if (!ctx->engine[RCS].state) {
+		ret = intel_lr_context_deferred_alloc(ctx, engine);
+		if (ret)
+			return ret;
+	}
+
+	req = i915_gem_request_alloc(engine, ctx);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
+	ret = gen9_emit_trtt_regs(req);
+	if (ret) {
+		i915_gem_request_cancel(req);
+		return ret;
+	}
+
+	i915_add_request(req);
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h
b/drivers/gpu/drm/i915/intel_lrc.h
index a17cb12..f3600b2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -107,6 +107,7 @@ void intel_lr_context_reset(struct drm_device *dev,
  			struct intel_context *ctx);
  uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
  				     struct intel_engine_cs *engine);
+int intel_lr_rcs_context_setup_trtt(struct intel_context *ctx);

  u32 intel_execlists_ctx_id(struct intel_context *ctx,
  			   struct intel_engine_cs *engine);
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a5524cc..604da23 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1167,7 +1167,15 @@ struct drm_i915_gem_context_param {
  #define I915_CONTEXT_PARAM_BAN_PERIOD	0x1
  #define I915_CONTEXT_PARAM_NO_ZEROMAP	0x2
  #define I915_CONTEXT_PARAM_GTT_SIZE	0x3
+#define I915_CONTEXT_PARAM_TRTT		0x4
  	__u64 value;
  };

+struct drm_i915_gem_context_trtt_param {
+	__u64 segment_base_addr;
+	__u64 l3_table_address;
+	__u32 invd_tile_val;
+	__u32 null_tile_val;
+};
+
  #endif /* _UAPI_I915_DRM_H_ */
--
1.9.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux