[working with ARM XScale processor and GCC cross-compiler] I am trying to write an inline function that does a 32-bit by 32-bit multiply (32x32->64). I want to save the result into a long long type. I also want to use the SMULL instruction followed by a STRD instruction. SMULL: Returns 64-bit product of 32x32 multiplication in two registers STRD: Stores contents of two consecutive registers into two consecutive memory slots So far I have: static inline void MULT32_32_64(volatile long long* result, int a, int b) { asm volatile("SMULL r2, r3, %[a], %[b]\n\t" "STRD r2, %[result]\n\t" : [res] "=m" (*result) : [a] "r" (a), [b] "r" (b) : "r2", "r3", "memory" ); return; } Here's what's strange: it works correctly for all optimization levels except -O2 (so it works for -O3). With -O2, the 64-bit output gets some very strange value (146406845186048), regardless of input. Where I think the problem could be: 1. STRD requires an 8-byte aligned address as the second operand 2. GCC compiler does not understand that the value pointed to by 'result' cannot be modified (since the pointer is what's listed in the constraints) 3. I am not correctly understanding how 64-bit types are stored in memory (but then why does it work for 3 of the 4 optimization levels?) 4. It cannot be done without using extra cycles doing operations like "result = (high << 32) + low" Thanks in advance for your thoughts! -- View this message in context: http://www.nabble.com/Storing-64-bit-result-on-ARM-using-inline-assembly-tp24296951p24296951.html Sent from the gcc - Help mailing list archive at Nabble.com.