Re: [PATCH 2/2] drm/i915: Consolidate gen8_emit_pipe_control

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Thu, 16 Feb 2017 08:12:59 +0000

On Thu, Feb 16, 2017 at 07:53:13AM +0000, Tvrtko Ursulin wrote:
> 
> On 15/02/2017 16:33, Chris Wilson wrote:
> >On Wed, Feb 15, 2017 at 04:06:34PM +0000, Tvrtko Ursulin wrote:
> >>+static inline u32 *gen8_emit_pipe_control(u32 *batch, u32 flags, u32 offset)
> >>+{
> >>+	static const u32 pc6[6] = { GFX_OP_PIPE_CONTROL(6), 0, 0, 0, 0, 0 };
> >>+
> >>+	memcpy(batch, pc6, sizeof(pc6));
> >>+
> >>+	batch[1] = flags;
> >>+	batch[2] = offset;
> >>+
> >>+	return batch + 6;
> >
> >godbolt would seem to say it is best to use
> >static inline u32 *gen8_emit_pipe_control(u32 *batch, u32 flags, u32 offset)
> >{
> >	batch[0] = GFX_OP_PIPE_CONTROL(6);
> >	batch[1] = flags;
> >	batch[2] = offset;
> >	batch[3] = 0;
> >	batch[4] = 0;
> >	batch[5] = 0;
> >
> >	return batch + 6;
> >}
> 
> Yeah agreed, it was a bit silly. I falsely remember it had quite
> good effects on the optimisation gcc was able to do but couldn't
> repro that.
> 
> How about though replacing the last three assignments with
> memset(&batch[3], 0, 3 * sizeof(u32))? That is indeed helpful on
> 64-bit.

Hah. Yes. Probably something to do with C preventing combining adjoining
writes to memory? With memset it uses a *(uint64_t *)&batch[3] = 0, and
we are not going to write that ugly code ourselves ;)

Once we accept that gcc will inline the memset, it becomes equally as
good just to use memset(batch, 0, 6*sizeof(u32)). Just need to double
check that the kernel cflags don't prevent that magic.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx