Re: Poor man's JIT compiler

Robert Bernecky <bernecky@xxxxxxxxxxxxxxx> · Wed, 02 Sep 2009 18:32:29 -0400

Hi, Dean.

My initial attempt at compiler options was just -O0.

That resulted in the jmp insertion problem, so I conjectured
that there might be some alignment requirements/desires that
would result in jmp instructions being added to make each
labeled fragment start on an "appropriate" boundary.
Clearly, the no-align options did not help.

So, I just tried out your suggestion:

#define OP(nm, cod)  \
FS##nm: cod          \
   asm("nop" : : );  \
FE##nm:

This has the effect of inserting a NOP at the end of each code fragment.
And, it DOES appear to work (although I just quickly eyeballed
the asm code, so I might be missing something). I'll give
it more a careful workover tomorrow. (That was WITH the current
-noalign options still active.)

Now, what was it that led you to propose that inserting a NOP
would have the desired effect?

Many thanks for your reply!
Robert

Dean Anderson wrote:
I suspect it does this because of instruction alignment and pipelining
issues.   Why are you trying to turn off alignment?

You might try adding a nop after each one. 

		--Dean

On Tue, 1 Sep 2009, Robert Bernecky wrote:

I'm trying to get gcc version 4.3.2 to emit X86-64 code
fragments that I can catenate to perform my own JIT
compilation, but the compiler is being recalcitrant.

(I was using a jump table, but its performance was underwhelming.)

Roughly, what I've done is to create a set of code fragments,
with labels so that I can determine their address ( via &&label)
and length. E.g.,

topLoad1:  reg1 = x[i];
botLoad1:

topLoad2:  reg2 = y[i];
botLoad2:

topAdd:    regz = reg1 + reg2;
BotAdd:

topStore:  z[i] = regz;
botStore:

Then, I have a table of fragment addresses (topLoad1, topLoad2, etc.)
and lengths (botLoad1-topLoad1, botLoad2-topLoad2), and a
(unknown statically) list of fragments to be assembled to build
working code, e.g.:

  (Load2, Load1, Add, Store, Loop)

I assemble the fragments into a code buffer and jump to it,
or so the story goes. Unfortunately, what I'm seeing in the
generated code fragments is not fun:

1. GCC sometimes, but NOT always, inserts jumps to the next
    fragment. E.g.:

----------------------------------------------

.L46:
         .loc 2 34 0
         movq    -264(%rbp), %rax
         movq    %rax, -40(%rbp)
.L47:
.L7:
         .loc 2 40 0
         movl    %r8d, %eax
         jmp     .L48
.L6:
.L48:
         .loc 2 43 0
         movl    %r11d, %ecx
.L49:
.L50:
----------------------------------------------

Note the jmp .L48. If GCC always inserted a jump, I could
remove it, or if it never inserted the jump, I'd be even
happier, but it only does it now and then. I tried adding
my own jumps to force this:

topLoad2:  reg2 = y[i];
            goto botLoad2;
botLoad2:

but GCC removed them. And inserted others.

Today, I'm using these compiler options:

gcc  -O0 -ggdb -mtune=opteron -fno-align-labels -fno-align-jumps

So, I welcome suggestions on how to solve or work around these
problems. Or even a completely different approach.

Thanks,
Robert