On 21/07/16 14:31, Chris Wilson wrote:
On Thu, Jul 21, 2016 at 02:16:22PM +0100, Tvrtko Ursulin wrote:
On 21/07/16 13:59, Chris Wilson wrote:
On Thu, Jul 21, 2016 at 01:00:47PM +0100, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Static table wastes space for invalid combinations and
engines which are not supported by Gen6 (legacy semaphores).
Replace it with a function devised by Dave Gordon.
I have verified that it generates the same mappings between
mbox selectors and signalling registers.
So just how big was that table? How big are the functions replacing it?
With I915_NUM_ENGINES of 5 table is 5 * 5 * (2 * 4) = 200 bytes.
With the patch .text grows by 144 bytes here and .rodata shrinks by
256. So a net gain of 112 bytes with my config. Conclusion is that
as long as we got five engines it is not that interesting to get rid
of the table.
v2: Add a comment describing what gen6_sem_f does.
v3: This time with git add.
I like having the table a lot... Even if we don't find the function
convincing we should add that comment.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Cc: Dave Gordon <david.s.gordon@xxxxxxxxx>
Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
---
drivers/gpu/drm/i915/i915_reg.h | 7 +--
drivers/gpu/drm/i915/intel_engine_cs.c | 93 +++++++++++++++++++++++++++++++++
drivers/gpu/drm/i915/intel_ringbuffer.c | 40 +-------------
drivers/gpu/drm/i915/intel_ringbuffer.h | 3 ++
4 files changed, 102 insertions(+), 41 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 9397ddec26b9..c2fe718582c8 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1604,9 +1604,10 @@ enum skl_disp_power_wells {
#define RING_HEAD(base) _MMIO((base)+0x34)
#define RING_START(base) _MMIO((base)+0x38)
#define RING_CTL(base) _MMIO((base)+0x3c)
-#define RING_SYNC_0(base) _MMIO((base)+0x40)
-#define RING_SYNC_1(base) _MMIO((base)+0x44)
-#define RING_SYNC_2(base) _MMIO((base)+0x48)
+#define RING_SYNC(base, n) _MMIO((base) + 0x40 + (n) * 4)
+#define RING_SYNC_0(base) RING_SYNC(base, 0)
+#define RING_SYNC_1(base) RING_SYNC(base, 1)
+#define RING_SYNC_2(base) RING_SYNC(base, 2)
#define GEN6_RVSYNC (RING_SYNC_0(RENDER_RING_BASE))
#define GEN6_RBSYNC (RING_SYNC_1(RENDER_RING_BASE))
#define GEN6_RVESYNC (RING_SYNC_2(RENDER_RING_BASE))
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index f4a35ec78481..19455b20b322 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -209,3 +209,96 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
return i915_cmd_parser_init_ring(engine);
}
+
+#define I915_NUM_GEN6_SEMAPHORE_ENGINES (4)
+
+/*
+ * For Gen6 semaphores where the driver issues MI_SEMAPHORE_MBOX commands
+ * with register selects so that a specific engine can wake up another engine
+ * waiting on a matching register, the matrix of required register selects
+ * looks like this:
+ *
+ * | RCS | VCS | BCS | VECS
+ * -----+---------------------------+---------------------------+---------------------------+---------------------------
+ * RCS | MI_SEMAPHORE_SYNC_INVALID | MI_SEMAPHORE_SYNC_VR | MI_SEMAPHORE_SYNC_BR | MI_SEMAPHORE_SYNC_VER
+ * VCS | MI_SEMAPHORE_SYNC_RV | MI_SEMAPHORE_SYNC_INVALID | MI_SEMAPHORE_SYNC_BV | MI_SEMAPHORE_SYNC_VEV
+ * BCS | MI_SEMAPHORE_SYNC_RB | MI_SEMAPHORE_SYNC_VB | MI_SEMAPHORE_SYNC_INVALID | MI_SEMAPHORE_SYNC_VEB
+ * VECS | MI_SEMAPHORE_SYNC_RVE | MI_SEMAPHORE_SYNC_VVE | MI_SEMAPHORE_SYNC_BVE | MI_SEMAPHORE_SYNC_INVALID
+ *
+ * This distilled to integers looks like this:
+ *
+ * | 0 | 1 | 2 | 3
+ * --+-----+-----+-----+-----
+ * 0 | -1 | 0 | 2 | 1
+ * 1 | 2 | -1 | 0 | 1
+ * 2 | 0 | 2 | -1 | 1
+ * 3 | 2 | 1 | 0 | -1
+ *
+ * In the opposite direction, the same table showing register addresses is:
+ *
+ * | RCS | VCS | BCS | VECS
+ * -----+--------------+--------------+--------------+--------------
+ * RCS | GEN6_NOSYNC | GEN6_RVSYNC | GEN6_RBSYNC | GEN6_RVESYNC
+ * VCS | GEN6_VRSYNC | GEN6_NOSYNC | GEN6_VBSYNC | GEN6_VVESYNC
+ * BCS | GEN6_VRSYNC | GEN6_BVSYNC | GEN6_NOSYNC | GEN6_BVESYNC
+ * VECS | GEN6_VERSYNC | GEN6_VEVSYNC | GEN6_VEBSYNC | GEN6_NOSYNC
+ *
+ * Again this distilled to integers looks like this:
+ *
+ * | 0 | 1 | 2 | 3
+ * --+-----+-----+-----+-----
+ * 0 | -1 | 0 | 1 | 2
+ * 1 | 1 | -1 | 0 | 2
+ * 2 | 0 | 1 | -1 | 2
+ * 3 | 1 | 2 | 0 | -1
+ *
+ * The function gen6_sem_f expresses the above table. We also notice that the
+ * difference between the first and second tabe is only a transpose of ones to
+ * twos and twos to ones.
+ */
+
+static int gen6_sem_f(unsigned int x, unsigned int y)
gen6_sema_select
gen6_semaphore_flag
Pick one name to replace gen6_sem_f you mean?
Yes, you contracted semaphore to sema last time (and that stuck). And by
_f I assume you mean mathematical function, which is a little boring.
Yes, and boring or not it is wrong since it is not a pure mathematical
function so I will rename it.
+{
+ if (x == y)
+ return -1;
+
+ x = intel_engines[x].guc_id;
+ y = intel_engines[y].guc_id;
hw_id.
Some guys named Chris and Dave removed it. ;D
I did?... I was aiming at removing guc_id! I was aware Dave was arguing
against removing guc_id just in case the firmware differed in future
from the existing gen5+ id.
I retracted that comment later, just confused things.
+
+ if (x >= I915_NUM_GEN6_SEMAPHORE_ENGINES ||
+ y >= I915_NUM_GEN6_SEMAPHORE_ENGINES)
+ return -1;
+
/*
* X
* | 0 | 1 | 2 | 3
* --+-----+-----+-----+-----
* 0 | | 0 | 1 | 2
* Y 1 | 1 | | 0 | 2
* 2 | 0 | 1 | | 2
* 3 | 1 | 2 | 0 |
*/
You want another copy of the table here?
Yes. In particular, I need to know which axis is X and which is Y.
Having the table here is much easier to compare to the output of the
code (same screen).
Ok.
+ x -= x >= y;
+ if (y == 1)
+ x = 3 - x;
+ x += y & 1;
+ return x % 3;
+}
+
+u32 gen6_wait_mbox(enum intel_engine_id x, enum intel_engine_id y)
static...
It is called from intel_ringbuffer.c.
Hmm. This was in intel_ringbuffer.c, at least I assumed so as this only
applies to legacy submission, for gen6-7.
It uses the static intel_engines array since the dev_priv->engines are
not initialized yet by the time it runs, for an engine.
Could as an alternative make the engine init phase multi-pass. Maybe.
Not sure what repercussions for the cleanup path that would have.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx