On Sun, 20 Aug 2006 15:43:03 PDT, Steve Freeland wrote: > > > From: Richard Earnshaw <Richard.Earnshaw@xxxxxxxxxxxxxxxxxxxxxxx> > > On Mon, 14 Aug 2006 15:15:46 PDT, Steve Freeland wrote: > > > Hello, > > > > > > I'm attempting to use gcc (the WinARM build, see -v output below) to comp > ile > > > the following stub program: > > > > > > int AEEMod_Load(IShell *pIShell, void *ph, IModule **ppMod) > > > { > > > return ENOMEMORY; /* defined to "2" */ > > > } > > > > > > The output disassembles to the following: > > > > > > Disassembly of section .text: > > > > > > 00000000 <AEEMod_Load>: > > > 0: e1a0c00d mov ip, sp > > > 4: e92dd800 stmdb sp!, {fp, ip, lr, pc} > > > 8: e24cb004 sub fp, ip, #4 ; 0x4 > > > c: e24dd00c sub sp, sp, #12 ; 0xc > > > 10: e50b0010 str r0, [fp, #-16] > > > 14: e50b1014 str r1, [fp, #-20] > > > 18: e50b2018 str r2, [fp, #-24] > > > 1c: e3a03002 mov r3, #2 ; 0x2 > > > 20: e1a00003 mov r0, r3 > > > 24: e24bd00c sub sp, fp, #12 ; 0xc > > > 28: e89da800 ldmia sp, {fp, sp, pc} > > > > > > This code crashes the device I'm working with. > > > > > > There seems to be a problem with the use of stmdb and ldmia to save and r > esto > > > re the register context to the stack. The stmdb instruction saves 4 regi > ster > > > s, and the ldmia only restores 3 of them, one of which (sp) isn't in the > orig > > > inal 4. This trick seems to be common, but I don't understand how it wor > ks. > > > What order are the registers saved and loaded in? As far as I can tell, > at > > > the end the pc register ends up with either the original fp or sp. > > > > > > What needs to happen is for the original sp to be restored, and the origi > nal > > > lr to be loaded into pc (the "return"). > > > > > > If I change the two last lines to the following: > > > > > > 24: e24bd004 sub sp, fp, #4 ; 0x4 > > > 28: e12fff1e bx lr > > > > > > The code works correctly. Can anyone explain this to me? > > First of all, thanks for the response! > > > Hmm, the code looks correct to me (note that although IP is saved, this is > just a copy of SP). > > Aaah yes, I missed that. Although the issue is tangential to my immediate pr > oblem, I'd really appreciate it if you could explain how that stmdb/ldmia tri > ck works. My understanding is that registers are saved in order (r0 first, r > 15/pc last) and loaded in reverse order. But that would mean the original va > lue of r15/pc that gets saved onto the stack at 0x4 gets *loaded* right back > into r15/pc at 0x28. Obviously that can't be right... what am I missing? > ldm and stm can normally be matched in pairs, so for example stmdb (D_ecrement B_efore) can be matched with ldmia (I_ncrement A_fter. The registers in both instructions are always in the same order, with the lowest numbered register at the lowest address in memory and incrementing upwards from there. To make things a bit easier when talking about stacks you can also talk about the stack layout in the instructions, you can then write your ldm and stm instructions using the stack mnemonics, the most common of which is a 'full-descending' stack (the stack grows by moving to a lower address and the bottom, addresed, word contains data -- it's full). So stmdb sp!, {r4, r5, r6} ldmdb sp!, {r4, r5, r6} will push r4-r6 onto the stack and then pop them off again. In fact, the above idiom is so common on ARM processors that in Thumb mode the equivalent instructions are known literally as push and pop[1], and for the above you would simply write push {r4, r5, r6} pop {r4, r5, r6} Now, going back to your original example, you will see that the compiler pushes 4 words onto the stack at the start of the function, but at the end it only pops 3 words off. How does this work and not leave the stack corrupted? The answer is that GCC is also saving the stack pointer on the stack, so when the pop happens at the end the original value of the stack pointer (which we copied int IP before we started messing with the stack at all) is restored directly. Another question you are probably asking is 'why do we push all those registers which are never needed?' The answer to that one is twofold: 1) The compiler is setting up a stack frame. These are useful when trying to debug your code, particularly on older debuggers. The stack frame allows a debugger to print out the call back-trace if things go wrong. With modern debug technology and debug descriptions such as dwarf a lot of this information is no-longer strictly necessary and the debugger can work out what happened with much less help from the compiled code -- and indeed it will do better if you address step 2 below ;-) 2) You didn't turn the optimizer on (use -O or -O2 -- or even -Os) and the compiler will simplify the code generated significantly. > Also, this: http://en.wikibooks.org/wiki/ARM/Programmer%27s_Model#Program_Cou > nter claims that r15/pc can't always be manipulated like any other register. > Is that correct? If so, is using ldmia into r15/pc always ok? > R15 is the program counter, and it's true that you can't treat it entirely like a normal register, but you can load and store its value; you can copy it to other registers; and you can use it in some simple addressing operations (either to generate an address value in a register, or directly in a pc-relative load instruction). For example, it's perfectly acceptable to write ldr r0, [pc, #32] and this will load the value that is 40 bytes beyond the address of the current instruction (the PC always reads 8 higher than the current address). However, the above is somewhat hard to remember, so the assembler will normally allow you to write ldr r0, L32 ... L32: .word 0x12345678 and it will work out the values for you (provided your label is within 4Kb of the instruction). > > Let me take a guess. You are using something like an ARM920 (or an ARM7TDM > I) device, and you are calling > > this function from Thumb code. If so, then you need to compile your functi > on with -mthumb-interwork, then it > > will generate a return sequence that switches correctly back to Thumb. > > You're correct that it's an ARM7TDMI device. The code which calls my code is > in firmware, so I'm not entirely sure whether it's Thumb or not... If you'r > e right, then presumably the calling procedure puts the appropriate flag in t > he LSB of r14/lr and expects the procedure being called to use bx and not ldm > ia to return. Is it the case, then, that the switch to or from Thumb mode ca > n't be done by modifying pc with ldmia? If so, that would certainly explain > the crash. I'll try that as soon as I get the chance. The ARM7TDMI is an implementation of version 4T of the ARM architecture, often termed ARMv4T. This was the first revision of the architecture to support Thumb and support for compiling your application as a mixture of both ARM and Thumb code (termed interworking) was fairly limited: the only way you could switch states was by using the BX instruction. The next revision of the architecture added support for state switching to ldr and ldm instructions as well, which makes interworking much more efficient. If you compile your original example with the same code compiled with -mthumb-interwork you'll probably see a return sequence something like 24: e24bd00c sub sp, fp, #12 ; 0xc 28: e89da800 ldmia sp, {fp, sp, lr} 2c: e12fff1e bx lr > > Thanks again, and it'd be great if you could sort me out with respect to the > stmdb/ldmia trick. > Anyway, I've probably bamboozled you with more than enough information by now, so I'd better stop. Hope the above helps, R. [1] The latest version of the ARM state assembly syntax has introduced this idiom too, but you'll need a very recent version of the GNU assembler to use it, and GCC still uses the old syntax at present.