Hi, Dean.
My initial attempt at compiler options was just -O0.
That resulted in the jmp insertion problem, so I conjectured
that there might be some alignment requirements/desires that
would result in jmp instructions being added to make each
labeled fragment start on an "appropriate" boundary.
Clearly, the no-align options did not help.
So, I just tried out your suggestion:
#define OP(nm, cod) \
FS##nm: cod \
asm("nop" : : ); \
FE##nm:
This has the effect of inserting a NOP at the end of each code fragment.
And, it DOES appear to work (although I just quickly eyeballed
the asm code, so I might be missing something). I'll give
it more a careful workover tomorrow. (That was WITH the current
-noalign options still active.)
Now, what was it that led you to propose that inserting a NOP
would have the desired effect?
Many thanks for your reply!
Robert
Dean Anderson wrote:
I suspect it does this because of instruction alignment and pipelining
issues. Why are you trying to turn off alignment?
You might try adding a nop after each one.
--Dean
On Tue, 1 Sep 2009, Robert Bernecky wrote:
I'm trying to get gcc version 4.3.2 to emit X86-64 code
fragments that I can catenate to perform my own JIT
compilation, but the compiler is being recalcitrant.
(I was using a jump table, but its performance was underwhelming.)
Roughly, what I've done is to create a set of code fragments,
with labels so that I can determine their address ( via &&label)
and length. E.g.,
topLoad1: reg1 = x[i];
botLoad1:
topLoad2: reg2 = y[i];
botLoad2:
topAdd: regz = reg1 + reg2;
BotAdd:
topStore: z[i] = regz;
botStore:
Then, I have a table of fragment addresses (topLoad1, topLoad2, etc.)
and lengths (botLoad1-topLoad1, botLoad2-topLoad2), and a
(unknown statically) list of fragments to be assembled to build
working code, e.g.:
(Load2, Load1, Add, Store, Loop)
I assemble the fragments into a code buffer and jump to it,
or so the story goes. Unfortunately, what I'm seeing in the
generated code fragments is not fun:
1. GCC sometimes, but NOT always, inserts jumps to the next
fragment. E.g.:
----------------------------------------------
.L46:
.loc 2 34 0
movq -264(%rbp), %rax
movq %rax, -40(%rbp)
.L47:
.L7:
.loc 2 40 0
movl %r8d, %eax
jmp .L48
.L6:
.L48:
.loc 2 43 0
movl %r11d, %ecx
.L49:
.L50:
----------------------------------------------
Note the jmp .L48. If GCC always inserted a jump, I could
remove it, or if it never inserted the jump, I'd be even
happier, but it only does it now and then. I tried adding
my own jumps to force this:
topLoad2: reg2 = y[i];
goto botLoad2;
botLoad2:
but GCC removed them. And inserted others.
Today, I'm using these compiler options:
gcc -O0 -ggdb -mtune=opteron -fno-align-labels -fno-align-jumps
So, I welcome suggestions on how to solve or work around these
problems. Or even a completely different approach.
Thanks,
Robert