Re: [PATCH v4 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers

"Siluvery, Arun" <arun.siluvery@xxxxxxxxxxxxxxx> · Wed, 17 Jun 2015 22:36:17 +0100

On 17/06/2015 21:21, Chris Wilson wrote:
On Wed, Jun 17, 2015 at 07:48:16PM +0100, Siluvery, Arun wrote:
On 16/06/2015 21:25, Chris Wilson wrote:
On Tue, Jun 16, 2015 at 08:25:20PM +0100, Arun Siluvery wrote:
+static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
+				    uint32_t offset,
+				    uint32_t *num_dwords)
+{
+	uint32_t index;
+	struct page *page;
+	uint32_t *cmd;
+
+	page = i915_gem_object_get_page(ring->wa_ctx.obj, 0);
+	cmd = kmap_atomic(page);
+
+	index = offset;
+
+	/* FIXME: fill one cacheline with NOOPs.
+	 * Replace these instructions with WA
+	 */
+	while (index < (offset + 16))
+		cmd[index++] = MI_NOOP;
+
+	/*
+	 * MI_BATCH_BUFFER_END is not required in Indirect ctx BB because
+	 * execution depends on the length specified in terms of cache lines
+	 * in the register CTX_RCS_INDIRECT_CTX
+	 */
+
+	kunmap_atomic(cmd);
+
+	if (index > (PAGE_SIZE / sizeof(uint32_t)))
+		return -EINVAL;

Check before you GPF!

You just overran the buffer and corrupted memory, if you didn't succeed
in trapping a segfault.

To be generic, align to the cacheline then check you have enough room
for your own data.
-Chris

Hi Chris,

The placement of condition is not correct. I don't completely follow
your suggestion, could you please elaborate; here we don't know
upfront how much more data to be written.

Hmm, are we anticipating an unbounded number of workarounds? At some
point you have to have a rough upper bound in order to do the bo
allocation. If we are really unsure, then we do need to split this into
two passes, one to count the number of dwords and the second to allocate
and actually fill the cmd[].

Since we have a full page dedicated for this, that should be sufficient 
for good number of WA; if we need more than one page means we have major 
issues.
The list for Gen8 is small, same for Gen9 also, maybe few more gets 
added going forward but not close to filling entire page. Some of them 
will even be restricted to specific steppings/revisions. For these 
reasons I think a single page setup is sufficient.
Do you anticipate any other use cases that require allocating more than 
one page?

Two pass approach can be implemented but it adds unnecessary complexity 
which may not be required in this case. please let me know your thoughts.

I have made below changes to check after writing every command and
return error as soon as we reach the end.

#define wa_ctx_emit(batch, cmd) {       \
                if (WARN_ON(index >= (PAGE_SIZE / sizeof(uint32_t)))) { \
                         kunmap_atomic(batch);                          \
                         return -ENOSPC;                                \
                 }                                                      \
                 batch[index++] = (cmd);                                \
         }
is this acceptable?
I think this is the only one issue, all other comments are addressed.

It's the lesser of evils for sure. Still feel dubious that we don't know
upfront how much data we need to allocate.
yes, but with single pass approach do you see any way it can be improved?

regards
Arun

-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx