On Thu, Feb 16, 2017 at 07:53:13AM +0000, Tvrtko Ursulin wrote: > > On 15/02/2017 16:33, Chris Wilson wrote: > >On Wed, Feb 15, 2017 at 04:06:34PM +0000, Tvrtko Ursulin wrote: > >>+static inline u32 *gen8_emit_pipe_control(u32 *batch, u32 flags, u32 offset) > >>+{ > >>+ static const u32 pc6[6] = { GFX_OP_PIPE_CONTROL(6), 0, 0, 0, 0, 0 }; > >>+ > >>+ memcpy(batch, pc6, sizeof(pc6)); > >>+ > >>+ batch[1] = flags; > >>+ batch[2] = offset; > >>+ > >>+ return batch + 6; > > > >godbolt would seem to say it is best to use > >static inline u32 *gen8_emit_pipe_control(u32 *batch, u32 flags, u32 offset) > >{ > > batch[0] = GFX_OP_PIPE_CONTROL(6); > > batch[1] = flags; > > batch[2] = offset; > > batch[3] = 0; > > batch[4] = 0; > > batch[5] = 0; > > > > return batch + 6; > >} > > Yeah agreed, it was a bit silly. I falsely remember it had quite > good effects on the optimisation gcc was able to do but couldn't > repro that. > > How about though replacing the last three assignments with > memset(&batch[3], 0, 3 * sizeof(u32))? That is indeed helpful on > 64-bit. Hah. Yes. Probably something to do with C preventing combining adjoining writes to memory? With memset it uses a *(uint64_t *)&batch[3] = 0, and we are not going to write that ugly code ourselves ;) Once we accept that gcc will inline the memset, it becomes equally as good just to use memset(batch, 0, 6*sizeof(u32)). Just need to double check that the kernel cflags don't prevent that magic. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx