On Fri, 2006-12-08 at 17:21 +0000, Andrew Haley wrote: > de Brebisson, Cyrille (Calculator Division) writes: > > > [snip] trying to re-code, using inline assembly goto *jump[*progc++] > > I used inline assembly to do: > > Ldrh instr, [progc], #2 // note that in most cases, there is an > > // extra instruction here that allows to > > // cancel the waitstate caused by the use > > // of register instr on the next > > instruction > > ldr pc, [jump, instr, asl #2] > > > > because the compiler generates the highly unoptimized (and too large for > > the memory in my device) > > ldrh r1, [r4], #2 > > ldr r8, .L2691+4 > > ldr fp, [r8, r1, asl #2] > > mov pc, fp @ indirect register jump > > [/snip] > > > > >This is the crucial mistake: you can't jump out of an inline asm. > > > > So, how can I optimize my code? Is there a way to force the compiler to > > 1: put a variable in a register? As the asm ("register"); constraint > > does not seem to do a lot of forcing > > Definitely: if declaring a global register variable doesn't work, > that's a bug. What exactly did you try? > > > 2: get the compiler to condense the last 2 instructions in 1? > > I'm not sure why gcc generates that sequence. Forwarding to Richard > Earnshaw for comment. First of all, you don't mention which version of the compiler you are using, so it's hard to know precisely why you get the code you do. GCC-4.1 is used in my example below. Trying to second guess the compiler is rarely profitable, but it's not clear to me why the address of the jump table is not being hoisted out of the loop. There is a hack that will effectively force this in this instance. By loading a global variable (or you could pass it in as an additional parameter such that it is always zero), we force the address calculation into a local variable that the compiler can't (easily) optimize away. For the following test-case: int offset = 0; void runprog(unsigned short *prog, int count) { __label__ code0, code1, code2, code3; static const void* const jump[4] = { &&code0, &&code1, &&code2, &&code3 }; const void* const* interp = jump+offset; while (count--) { goto *interp[*prog++]; code0: foo(); continue; code1: bar(); continue; code2: wibble(); continue; code3: wombat(); break; } } The critical part of the loop then compiles to: ldrh r3, [r5], #2 ldr pc, [r6, r3, asl #2] @ indirect memory jump which looks fine to me. Note, however, that if your 'switch' statement is large, then you'll quite probably get spilling of variables. The value of interp is higly likely to be a candidate here because it's used exactly once per iteration, so you'll then be back to where you started. I'm somewhat confused as to why you haven't just used a switch table for this, though. The equivalent code: void runprog(unsigned short *prog, int count) { while (count--) { switch(*prog++) { case 0: foo(); continue; case 1: bar(); continue; case 2: wibble(); continue; case 3: wombat(); goto done; } } done: ; } is much easier to understand and much more ammenable to the standard optimizer framework. R.