On Thu, Jul 21, 2016 at 02:16:22PM +0100, Tvrtko Ursulin wrote: > > On 21/07/16 13:59, Chris Wilson wrote: > >On Thu, Jul 21, 2016 at 01:00:47PM +0100, Tvrtko Ursulin wrote: > >>From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >> > >>Static table wastes space for invalid combinations and > >>engines which are not supported by Gen6 (legacy semaphores). > >> > >>Replace it with a function devised by Dave Gordon. > >> > >>I have verified that it generates the same mappings between > >>mbox selectors and signalling registers. > > > >So just how big was that table? How big are the functions replacing it? > > With I915_NUM_ENGINES of 5 table is 5 * 5 * (2 * 4) = 200 bytes. > > With the patch .text grows by 144 bytes here and .rodata shrinks by > 256. So a net gain of 112 bytes with my config. Conclusion is that > as long as we got five engines it is not that interesting to get rid > of the table. > > >>v2: Add a comment describing what gen6_sem_f does. > >>v3: This time with git add. > > > >I like having the table a lot... Even if we don't find the function > >convincing we should add that comment. > > > >>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >>Cc: Dave Gordon <david.s.gordon@xxxxxxxxx> > >>Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>--- > >> drivers/gpu/drm/i915/i915_reg.h | 7 +-- > >> drivers/gpu/drm/i915/intel_engine_cs.c | 93 +++++++++++++++++++++++++++++++++ > >> drivers/gpu/drm/i915/intel_ringbuffer.c | 40 +------------- > >> drivers/gpu/drm/i915/intel_ringbuffer.h | 3 ++ > >> 4 files changed, 102 insertions(+), 41 deletions(-) > >> > >>diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h > >>index 9397ddec26b9..c2fe718582c8 100644 > >>--- a/drivers/gpu/drm/i915/i915_reg.h > >>+++ b/drivers/gpu/drm/i915/i915_reg.h > >>@@ -1604,9 +1604,10 @@ enum skl_disp_power_wells { > >> #define RING_HEAD(base) _MMIO((base)+0x34) > >> #define RING_START(base) _MMIO((base)+0x38) > >> #define RING_CTL(base) _MMIO((base)+0x3c) > >>-#define RING_SYNC_0(base) _MMIO((base)+0x40) > >>-#define RING_SYNC_1(base) _MMIO((base)+0x44) > >>-#define RING_SYNC_2(base) _MMIO((base)+0x48) > >>+#define RING_SYNC(base, n) _MMIO((base) + 0x40 + (n) * 4) > >>+#define RING_SYNC_0(base) RING_SYNC(base, 0) > >>+#define RING_SYNC_1(base) RING_SYNC(base, 1) > >>+#define RING_SYNC_2(base) RING_SYNC(base, 2) > >> #define GEN6_RVSYNC (RING_SYNC_0(RENDER_RING_BASE)) > >> #define GEN6_RBSYNC (RING_SYNC_1(RENDER_RING_BASE)) > >> #define GEN6_RVESYNC (RING_SYNC_2(RENDER_RING_BASE)) > >>diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > >>index f4a35ec78481..19455b20b322 100644 > >>--- a/drivers/gpu/drm/i915/intel_engine_cs.c > >>+++ b/drivers/gpu/drm/i915/intel_engine_cs.c > >>@@ -209,3 +209,96 @@ int intel_engine_init_common(struct intel_engine_cs *engine) > >> > >> return i915_cmd_parser_init_ring(engine); > >> } > >>+ > >>+#define I915_NUM_GEN6_SEMAPHORE_ENGINES (4) > >>+ > >>+/* > >>+ * For Gen6 semaphores where the driver issues MI_SEMAPHORE_MBOX commands > >>+ * with register selects so that a specific engine can wake up another engine > >>+ * waiting on a matching register, the matrix of required register selects > >>+ * looks like this: > >>+ * > >>+ * | RCS | VCS | BCS | VECS > >>+ * -----+---------------------------+---------------------------+---------------------------+--------------------------- > >>+ * RCS | MI_SEMAPHORE_SYNC_INVALID | MI_SEMAPHORE_SYNC_VR | MI_SEMAPHORE_SYNC_BR | MI_SEMAPHORE_SYNC_VER > >>+ * VCS | MI_SEMAPHORE_SYNC_RV | MI_SEMAPHORE_SYNC_INVALID | MI_SEMAPHORE_SYNC_BV | MI_SEMAPHORE_SYNC_VEV > >>+ * BCS | MI_SEMAPHORE_SYNC_RB | MI_SEMAPHORE_SYNC_VB | MI_SEMAPHORE_SYNC_INVALID | MI_SEMAPHORE_SYNC_VEB > >>+ * VECS | MI_SEMAPHORE_SYNC_RVE | MI_SEMAPHORE_SYNC_VVE | MI_SEMAPHORE_SYNC_BVE | MI_SEMAPHORE_SYNC_INVALID > >>+ * > >>+ * This distilled to integers looks like this: > >>+ * > >>+ * | 0 | 1 | 2 | 3 > >>+ * --+-----+-----+-----+----- > >>+ * 0 | -1 | 0 | 2 | 1 > >>+ * 1 | 2 | -1 | 0 | 1 > >>+ * 2 | 0 | 2 | -1 | 1 > >>+ * 3 | 2 | 1 | 0 | -1 > >>+ * > >>+ * In the opposite direction, the same table showing register addresses is: > >>+ * > >>+ * | RCS | VCS | BCS | VECS > >>+ * -----+--------------+--------------+--------------+-------------- > >>+ * RCS | GEN6_NOSYNC | GEN6_RVSYNC | GEN6_RBSYNC | GEN6_RVESYNC > >>+ * VCS | GEN6_VRSYNC | GEN6_NOSYNC | GEN6_VBSYNC | GEN6_VVESYNC > >>+ * BCS | GEN6_VRSYNC | GEN6_BVSYNC | GEN6_NOSYNC | GEN6_BVESYNC > >>+ * VECS | GEN6_VERSYNC | GEN6_VEVSYNC | GEN6_VEBSYNC | GEN6_NOSYNC > >>+ * > >>+ * Again this distilled to integers looks like this: > >>+ * > >>+ * | 0 | 1 | 2 | 3 > >>+ * --+-----+-----+-----+----- > >>+ * 0 | -1 | 0 | 1 | 2 > >>+ * 1 | 1 | -1 | 0 | 2 > >>+ * 2 | 0 | 1 | -1 | 2 > >>+ * 3 | 1 | 2 | 0 | -1 > >>+ * > >>+ * The function gen6_sem_f expresses the above table. We also notice that the > >>+ * difference between the first and second tabe is only a transpose of ones to > >>+ * twos and twos to ones. > >>+ */ > >>+ > >>+static int gen6_sem_f(unsigned int x, unsigned int y) > > > >gen6_sema_select > >gen6_semaphore_flag > > Pick one name to replace gen6_sem_f you mean? Yes, you contracted semaphore to sema last time (and that stuck). And by _f I assume you mean mathematical function, which is a little boring. > >>+{ > >>+ if (x == y) > >>+ return -1; > >>+ > >>+ x = intel_engines[x].guc_id; > >>+ y = intel_engines[y].guc_id; > > > >hw_id. > > Some guys named Chris and Dave removed it. ;D I did?... I was aiming at removing guc_id! I was aware Dave was arguing against removing guc_id just in case the firmware differed in future from the existing gen5+ id. > >>+ > >>+ if (x >= I915_NUM_GEN6_SEMAPHORE_ENGINES || > >>+ y >= I915_NUM_GEN6_SEMAPHORE_ENGINES) > >>+ return -1; > >>+ > > > >/* > > * X > > * | 0 | 1 | 2 | 3 > > * --+-----+-----+-----+----- > > * 0 | | 0 | 1 | 2 > > * Y 1 | 1 | | 0 | 2 > > * 2 | 0 | 1 | | 2 > > * 3 | 1 | 2 | 0 | > > */ > > You want another copy of the table here? Yes. In particular, I need to know which axis is X and which is Y. Having the table here is much easier to compare to the output of the code (same screen). > >>+ x -= x >= y; > >>+ if (y == 1) > >>+ x = 3 - x; > >>+ x += y & 1; > >>+ return x % 3; > >>+} > >>+ > >>+u32 gen6_wait_mbox(enum intel_engine_id x, enum intel_engine_id y) > > > >static... > > It is called from intel_ringbuffer.c. Hmm. This was in intel_ringbuffer.c, at least I assumed so as this only applies to legacy submission, for gen6-7. > >>+{ > >>+ int r; > >>+ > >>+ r = gen6_sem_f(x, y); > >>+ if (r < 0) > >>+ return MI_SEMAPHORE_SYNC_INVALID; > >>+ > >>+ if (r == 1) > >>+ r = 2; > >>+ else if (r == 2) > >>+ r = 1; > >>+ > >>+ return r << 16; > > > >/* Convert semaphore sync field to its wait flag */ > >switch (gen6_sem_f(x, y)) { > >case 0: return 0; > >case 1: return 2 << 16; > >case 2: return 1 << 16; > >default: eturn MI_SEMAPHORE_SYNC_INVALID; > > > >} > > Meh. :) Bike-shedding territory. I know which I found easier to understand ;) -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx