On 15/06/15 15:10, Siluvery, Arun wrote: > On 12/06/2015 18:03, Dave Gordon wrote: >> On 12/06/15 12:58, Siluvery, Arun wrote: >>> On 09/06/2015 19:43, Dave Gordon wrote: >>>> On 05/06/15 14:57, Arun Siluvery wrote: >>>>> In Per context w/a batch buffer, >>>>> WaRsRestoreWithPerCtxtBb >>>>> >>>>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and >>>>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions >>>>> so as to not break any future users of existing definitions (Michel) >>>>> >>>>> Signed-off-by: Rafael Barbalho <rafael.barbalho@xxxxxxxxx> >>>>> Signed-off-by: Arun Siluvery <arun.siluvery@xxxxxxxxxxxxxxx> >>>>> --- >>>>> drivers/gpu/drm/i915/i915_reg.h | 26 ++++++++++++++++++ >>>>> drivers/gpu/drm/i915/intel_lrc.c | 59 >>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>> 2 files changed, 85 insertions(+) [snip] >>>>> + /* >>>>> + * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and >>>>> + * MI_BATCH_BUFFER_END instructions in this sequence need to be >>>>> + * in the same cacheline. >>>>> + */ >>>>> + while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0) >>>>> + cmd[index++] = MI_NOOP; >>>>> + >>>>> + cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 | >>>>> + MI_LRM_USE_GLOBAL_GTT | >>>>> + MI_LRM_ASYNC_MODE_ENABLE; >>>>> + cmd[index++] = INSTPM; >>>>> + cmd[index++] = scratch_addr; >>>>> + cmd[index++] = 0; >>>>> + >>>>> + /* >>>>> + * BSpec says there should not be any commands programmed >>>>> + * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so >>>>> + * do not add any new commands >>>>> + */ >>>>> + cmd[index++] = MI_LOAD_REGISTER_REG_GEN8; >>>>> + cmd[index++] = GEN8_RS_PREEMPT_STATUS; >>>>> + cmd[index++] = GEN8_RS_PREEMPT_STATUS; >>>>> + >>>>> /* padding */ >>>>> while (index < end) >>>>> cmd[index++] = MI_NOOP; >>>>> >>>> >>>> Where's the MI_BATCH_BUFFER_END referred to in the comment? >>> >>> MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6]. >>> Since the diff context is only few lines it didn't showup in the diff. >> >> The second comment above says "no commands between LOAD_REG_REG and >> BB_END", so the point of my comment was that the BB_END is *NOT* >> immediately after the LOAD_REG_REG -- there are a bunch of no-ops there! > > true, but they are no-ops so they shouldn't really affect anything. I > guess the spec means no valid commands. I guess the spec means "NO COMMANDS". NOOP is a perfectly valid command, and I've even seen cases where a workaround specifically requires "a NOOP with the set-no-op-id-register bit set" to fix some particular bug. The only special thing about NOOP is that it doesn't get captured in IPEHR. I think the w/a requires this: 0%CLSIZE: ... LRM (reg, addr, 0) LRR (reg, reg) BB_END ... (63%CLSIZE) no gaps, no insertions, all together, all on one cacheline. Those instructions take up 8 DWords (32 bytes) so the sequence doesn't necessarily have to start on a cacheline boundary, as long as it's entirely within the same line. But it's simpler to start on a new line. You seem to have: 0%CLSIZE: LRM (reg, mem, 0) LRR (reg, reg) NOP NOP NOP BB_END so the condition in the comment is not fulfilled. If this works, maybe the comment is wrong. >> And therefore also, these instructions do *not* all end up in the same >> cacheline, thus contradicting the first comment above. > > I don't understand why. As per the requirement the commands from the > first MI_LOAD_REGISTER_MEM_GEN8 (after while) through BB_END will be > part of same cacheline (in this case second line). OK, they're all in the same line; I didn't look back at the full context enough and thought 'end' would point to the end of the buffer, rather than the end of the cacheline .. because it /does/ point to the end of the buffer, it just happens to be the end of the very same cacheline as well. So I really don't like the way the sizes of the two workaround batches have been defined in terms of cache lines. Also I'm not keen on one bit of code allocating the object and defining the sizes of the sub-areas within it, and then separate functions filling in each of the sequences within these areas, "knowing" that the areas are /just the right size/. It would be simpler to maintain if the "size in cachelines" values in lrc_setup_ctx_wa_obj() didn't have to be hand-edited to stay in sync with the number of instructions written by gen8_init_perctx_bb() and gen8_init_indirectctx_bb(). How about having each of these return the number of bytes they've appended to the (u32 *)buffer that they've been given, and let the caller manage mapping/unmapping, alignment, padding, etc, and fill in the size fields accordingly *after* the content has been defined? .Dave. >> Padding *after* a BB_END would be redundant. > > yes, I just wanted to keep MI_BATCH_BUFFER_END at the end instead of > abruptly terminating the batch which is why I am padding with no-ops, I > can change this if that is preferred. >> >> .Dave. _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx