On Tue, 11 Aug 2009, Nicolas Pitre wrote: > > #define SHA_SRC(t) \ > ({ unsigned char *__d = (unsigned char *)&data[t]; \ > (__d[0] << 24) | (__d[1] << 16) | (__d[2] << 8) | (__d[3] << 0); }) > > And this provides the exact same performance as the ntohl() based > version (4.980s) except that this now cope with unaligned buffers too. Is it better to do a (conditional) memcpy up front? Or is the byte-based one better just because you always end up doing the shifting anyway due to most ARM situations being little-endian? I _suspect_ that most large SHA1 calls from git are pre-aligned. The big SHA1 calls are for pack-file verification in fsck, which should all be aligned. Same goes for index file integrity checking. The actual object SHA1 calculations are likely not aligned (we do that object header thing), and if you can't do the htonl() any better way I guess the byte-based thing is the way to go.. Linus --- block-sha1/sha1.c | 13 ++++++++++++- 1 files changed, 12 insertions(+), 1 deletions(-) diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c index 9bc8b8a..df27e66 100644 --- a/block-sha1/sha1.c +++ b/block-sha1/sha1.c @@ -25,6 +25,12 @@ void blk_SHA1_Init(blk_SHA_CTX *ctx) ctx->H[4] = 0xc3d2e1f0; } +#ifdef REALLY_SLOW_UNALIGNED + #define is_unaligned(ptr) (3 & (unsigned long)(ptr)) +#else + #define is_unaligned(ptr) 0 +#endif + void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, unsigned long len) { @@ -47,7 +53,12 @@ void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, unsigned long len) blk_SHA1Block(ctx, ctx->W); } while (len >= 64) { - blk_SHA1Block(ctx, data); + const unsigned int *block = data; + if (is_unaligned(data)) { + memcpy(ctx->W, data, 64); + block = ctx->W; + } + blk_SHA1Block(ctx, block); data += 64; len -= 64; } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html