Thanks for the hand, Dean.
I've run out of time to work on this until next month,
but want you to know that I appreciate you (and others) taking your
time to reply to me on this topic.
-fPIC seems to help a fair bit on the embedded jmp problem.
In summary, where I think I'm at with threaded/JIT compilation with
label pointers now is:
1. Using label pointers with jumps works, but is
quite slow, particularly when the code fragment
sizes are small (which they are in my case).
2. Making the compiler not generate jmp statements in
such code fragments is tricky, at best, and is going
to be a fragile area, in the sense of being very sensitive
to compiler changes, code fragment contents, etc.
Thanks again,
Bob
Dean Anderson wrote:
That's great. Its not quite fixed yet.
You can't turn off alignment where it would result in unexecutable code.
If I recall my x86 assembly, labels have to be word aligned, while the
next instruction doesn't always have to be. Depending on the distance
to the alignment required, an unconditional jmp might be a short cut.
Adding the nop just removed the oportunity for an unconditional jmp.
You'll have to find what happens for every case of misalignment. You
might still need to take care to only insert the nop when necessary.
That is, if your last instruction just happens to come perfectely
aligned, the nop might cause more another unconditional jmp to be
inserted.
Still, it occurs to me that the JMP, if it were position independent,
and the next buffer is always at the right alignment (the target of the
JMP), should cause no trouble. Try adding -PIC to your compiler
options.
--Dean
On Wed, 2 Sep 2009, Robert Bernecky wrote:
Hi, Dean.
My initial attempt at compiler options was just -O0.
That resulted in the jmp insertion problem, so I conjectured
that there might be some alignment requirements/desires that
would result in jmp instructions being added to make each
labeled fragment start on an "appropriate" boundary.
Clearly, the no-align options did not help.
So, I just tried out your suggestion:
#define OP(nm, cod) \
FS##nm: cod \
asm("nop" : : ); \
FE##nm:
This has the effect of inserting a NOP at the end of each code fragment.
And, it DOES appear to work (although I just quickly eyeballed
the asm code, so I might be missing something). I'll give
it more a careful workover tomorrow. (That was WITH the current
-noalign options still active.)
Now, what was it that led you to propose that inserting a NOP
would have the desired effect?
Many thanks for your reply!
Robert
Dean Anderson wrote:
I suspect it does this because of instruction alignment and pipelining
issues. Why are you trying to turn off alignment?
You might try adding a nop after each one.
--Dean
On Tue, 1 Sep 2009, Robert Bernecky wrote:
I'm trying to get gcc version 4.3.2 to emit X86-64 code
fragments that I can catenate to perform my own JIT
compilation, but the compiler is being recalcitrant.
(I was using a jump table, but its performance was underwhelming.)
Roughly, what I've done is to create a set of code fragments,
with labels so that I can determine their address ( via &&label)
and length. E.g.,
topLoad1: reg1 = x[i];
botLoad1:
topLoad2: reg2 = y[i];
botLoad2:
topAdd: regz = reg1 + reg2;
BotAdd:
topStore: z[i] = regz;
botStore:
Then, I have a table of fragment addresses (topLoad1, topLoad2, etc.)
and lengths (botLoad1-topLoad1, botLoad2-topLoad2), and a
(unknown statically) list of fragments to be assembled to build
working code, e.g.:
(Load2, Load1, Add, Store, Loop)
I assemble the fragments into a code buffer and jump to it,
or so the story goes. Unfortunately, what I'm seeing in the
generated code fragments is not fun:
1. GCC sometimes, but NOT always, inserts jumps to the next
fragment. E.g.:
----------------------------------------------
.L46:
.loc 2 34 0
movq -264(%rbp), %rax
movq %rax, -40(%rbp)
.L47:
.L7:
.loc 2 40 0
movl %r8d, %eax
jmp .L48
.L6:
.L48:
.loc 2 43 0
movl %r11d, %ecx
.L49:
.L50:
----------------------------------------------
Note the jmp .L48. If GCC always inserted a jump, I could
remove it, or if it never inserted the jump, I'd be even
happier, but it only does it now and then. I tried adding
my own jumps to force this:
topLoad2: reg2 = y[i];
goto botLoad2;
botLoad2:
but GCC removed them. And inserted others.
Today, I'm using these compiler options:
gcc -O0 -ggdb -mtune=opteron -fno-align-labels -fno-align-jumps
So, I welcome suggestions on how to solve or work around these
problems. Or even a completely different approach.
Thanks,
Robert