my first version of get_be32() was a macro that did this: #define SHA_SRC(t) \ ({ unsigned char *__d = (unsigned char *)&data[t]; \ (__d[0]<< 24) | (__d[1]<< 16) | (__d[2]<< 8) | (__d[3]<< 0); }) With such a construct, gcc would always allocate a register to hold __d and then dereference that with an offset from 0 to 3. Whereas: #define SHA_SRC(t) \ ({ unsigned char *__d = (unsigned char *)data; \ (__d[(t)*4 + 0]<< 24) | (__d[(t)*4 + 1]<< 16) | \ (__d[(t)*4 + 2]<< 8) | (__d[(t)*4 + 3]<< 0); }) does produce optimal assembly as only the register holding the data pointer is dereferenced with the absolute byte offset. I suspect your usage of inline functions has the same effect as the first SHA_SRC definition above.
Yes, that's what happens. Paolo -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html