Sebastian Schuberth <sschuberth@xxxxxxxxx> writes: > As ntohl()/htonl() are function calls (that internally do shifts), I > doubt they're faster than the shift macros, though I haven't measured > it. However, I do not suggest to go for the macros on Windows/Intel, > but to apply the following patch on top of your patch: Your proposed commit log message makes it sound as if this change is limited to Windows, but it is not protected with "#ifdef WIN32"; the change should be applicable to non-windows but the message is misleading. It should help any i386/amd64 platform whose ntohl()/htonl() is crappy, as long as __builtin_bswap32() is supported by the compiler. And it should not harm other platforms, nor i386/amd64 whose ntohl()/htonl() are sane. As i386/amd64 part of block-sha1/sha1.c has gcc dependency already, I think it would be safe to assume __builtin_bswap32() is available. But I'd want an Ack/Nack from the original authors (Cc'ed). It seems that your patch is linewrapped, so please be careful _if_ it needs to be modified and resent (if this version gets trivially acked I can fix it up when applying and in such a case there is no need to resend). > From: Sebastian Schuberth <sschuberth@xxxxxxxxx> > Date: Tue, 18 Aug 2009 12:33:35 +0200 > Subject: [PATCH] block-sha1: On Intel, use bswap built-in in favor of > ntohl()/htonl() > > On Windows/Intel, ntohl()/htonl() are function calls that do shifts to > swap the > byte order. Using the native bswap instruction boths gets rid of the > shifts and > the function call overhead to gain some performance. > > Signed-off-by: Sebastian Schuberth <sschuberth@xxxxxxxxx> > --- > block-sha1/sha1.c | 15 ++++++++++----- > 1 files changed, 10 insertions(+), 5 deletions(-) > > diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c > index f2830c0..07f2937 100644 > --- a/block-sha1/sha1.c > +++ b/block-sha1/sha1.c > @@ -66,15 +66,20 @@ > > /* > * Performance might be improved if the CPU architecture is OK with > - * unaligned 32-bit loads and a fast ntohl() is available. > + * unaligned 32-bit loads and a fast ntohl() is available. On Intel, > + * use the bswap built-in to get rid of the function call overhead. > * Otherwise fall back to byte loads and shifts which is portable, > * and is faster on architectures with memory alignment issues. > */ > > -#if defined(__i386__) || defined(__x86_64__) || \ > - defined(__ppc__) || defined(__ppc64__) || \ > - defined(__powerpc__) || defined(__powerpc64__) || \ > - defined(__s390__) || defined(__s390x__) > +#if defined(__i386__) || defined(__x86_64__) > + > +#define get_be32(p) __builtin_bswap32(*(unsigned int *)(p)) > +#define put_be32(p, v) do { *(unsigned int *)(p) = > __builtin_bswap32(v); } while (0) > + > +#elif defined(__ppc__) || defined(__ppc64__) || \ > + defined(__powerpc__) || defined(__powerpc64__) || \ > + defined(__s390__) || defined(__s390x__) > > #define get_be32(p) ntohl(*(unsigned int *)(p)) > #define put_be32(p, v) do { *(unsigned int *)(p) = htonl(v); } while (0) > -- > 1.6.4.169.g64d5.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html