RE: Help needed: Optimization of bytecode interpreter for ARM paltform

"Andrew Haley" <aph@xxxxxxxxxxx> · Fri, 8 Dec 2006 17:21:52 +0000

de Brebisson, Cyrille (Calculator Division) writes:

 > [snip] trying to re-code, using inline assembly goto *jump[*progc++]
 > I used inline assembly to do:
 > Ldrh instr, [progc], #2       // note that in most cases, there is an
 >                               // extra instruction here that allows to
 >                               // cancel the waitstate caused by the use
 >                               // of register instr on the next
 > instruction
 > ldr pc, [jump, instr, asl #2]
 > 
 > because the compiler generates the highly unoptimized (and too large for
 > the memory in my device)
 > 	ldrh	r1, [r4], #2
 > 	ldr	r8, .L2691+4
 > 	ldr	fp, [r8, r1, asl #2]
 > 	mov	pc, fp	@ indirect register jump
 > [/snip]
 > 
 > >This is the crucial mistake: you can't jump out of an inline asm.
 > 
 > So, how can I optimize my code? Is there a way to force the compiler to
 > 1: put a variable in a register? As the asm ("register"); constraint
 > does not seem to do a lot of forcing

Definitely: if declaring a global register variable doesn't work,
that's a bug.  What exactly did you try?

 > 2: get the compiler to condense the last 2 instructions in 1?

I'm not sure why gcc generates that sequence.  Forwarding to Richard
Earnshaw for comment.

Andrew.

 > -----Original Message-----
 > From: Andrew Haley [mailto:aph@xxxxxxxxxxx] 
 > Sent: 08 December 2006 09:43
 > To: de Brebisson, Cyrille (Calculator Division)
 > Cc: gcc-help@xxxxxxxxxxx
 > Subject: Re: Help needed: Optimization of bytecode interpreter for ARM
 > paltform
 > 
 > de Brebisson, Cyrille (Calculator Division) writes:
 >  > Hello,
 >  > 
 >  > I hope that this is the best location to ask this question, if not,
 > please accept my apologize and redirect me where needed.
 >  > 
 >  > I am trying to write a fast byte code interpreter, but the compiler
 > optimizer just 'does not get it' and generates bad code (it does not
 > realize that they are jumps everywhere and optimizes out the code
 > out)...
 >  > 
 >  > Here is a simplified version of the code:
 >  > 
 >  > static int rom[]= 
 >  >   { 0, 1, 2, 3, 4, 5, 6, 7, 8, 
 >  >     9, 10, 11, 12, 13, 14, }; // the 'program'
 >  >  
 >  > 
 >  > void execute()
 >  > 
 >  > {
 >  > 
 >  >   const void * const jumps[] = 
 >  >     { &&ins000, &&ins001, &&ins002, &&ins003, 
 >  >       &&ins004, &&ins005, &&ins006, &&ins007 }; // table of jumps
 >  > 
 >  >   register int carry asm ("r0");
 >  >   register int instr asm("r1"); // currently executed instruction
 >  >   register int *pc asm ("r4"); // program counter, points on next
 > instr.
 >  >   register const void * const * jm asm ("r5") = jumps; //pointer jump
 > table
 >  > 
 >  > int a=0, b=0; // virtual machine registers
 >  > 
 >  > // this macro does a fast carry=0; goto *jumps[*pc++]; 
 >  > #define next asm ("ldrh %2, [%0], #2\n\t" \
 >  >                    "mov %1, #0\n\t" \
 >  >                    "ldr pc, [%4, %2, asl #2]" : 
 >  >                    "=r" (pc), 
 >  >                    "=r" (carry), 
 >  >                    "=r" (instr): 
 >  >                    "0" (pc), 
 >  >                    "r" (jm)) 
 >  > 
 >  > // this macro does a fast goto *jumps[*pc++]; 
 >  > #define nextnocarry asm ("ldrh %1, [%0], #2\n\t"\
 >  >                          "ldr pc, [%3, %1, asl #2]" : 
 >  >                          "=r" (pc), 
 >  >                          "=r" (instr) : 
 >  >                          "0" (pc), 
 >  >                          "r" (jm))
 > 
 > This is the crucial mistake: you can't jump out of an inline asm.
 > 
 > Andrew.
 >