On Wed, Jun 19, 2019 at 11:55:41PM +0000, Vineet Gupta wrote: > On 6/19/19 1:12 AM, Peter Zijlstra wrote: > > I'm assuming you've looked at what x86 currently does and found > > something like that doesn't work for ARC? > > Just looked at x86 code and it seems similar I think you missed a bit. > >>> + WRITE_ONCE(*instr_addr, instr); > >>> + flush_icache_range(entry->code, entry->code + JUMP_LABEL_NOP_SIZE); > > So do you have a 2 byte opcode that traps unconditionally? In that case > > I'm thinking you could do something like x86 does. And it would avoid > > that NOP padding you do to get the alignment. > > Just to be clear there is no trapping going on in the canonical sense of it. There > are regular instructions for NO-OP and Branch. > We do have 2 byte opcodes for both but as described the branch offset is too > limited so not usable. In particular we do not need the alignment. So what the x86 code does is: - overwrite the first byte of the instruction with a single byte trap instruction - machine wide IPI which synchronizes I$ At this point, any CPU that encounters this instruction will trap; and the trap handler will emulate the 'new' instruction -- typically a jump. - overwrite the tail of the instruction (if there is a tail) - machine wide IPI which syncrhonizes I$ At this point, nobody will execute the tail, because we'll still trap on that first single byte instruction, but if they were to read the instruction stream, the tail must be there. - overwrite the first byte of the instruction to now have a complete instruction. - machine wide IPI which syncrhonizes I$ At this point, any CPU will encounter the new instruction as a whole, irrespective of alignment. So the benefit of this scheme is that is works irrespective of the instruction fetch window size and don't need the 'funny' alignment stuff. Now, I've no idea if something like this is feasible on ARC; for it to work you need that 2 byte trap instruction -- since all instructions are 2 byte aligned, you can always poke that without issue.