Re: [PATCH] block-sha1: Windows declares ntohl() in winsock2.h

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 18 Aug 2009 04:07:31 -0700

Sebastian Schuberth <sschuberth@xxxxxxxxx> writes:

> As ntohl()/htonl() are function calls (that internally do shifts), I
> doubt they're faster than the shift macros, though I haven't measured
> it. However, I do not suggest to go for the macros on Windows/Intel,
> but to apply the following patch on top of your patch:

Your proposed commit log message makes it sound as if this change is
limited to Windows, but it is not protected with "#ifdef WIN32"; the
change should be applicable to non-windows but the message is misleading.

It should help any i386/amd64 platform whose ntohl()/htonl() is crappy, as
long as __builtin_bswap32() is supported by the compiler.  And it should
not harm other platforms, nor i386/amd64 whose ntohl()/htonl() are sane.
As i386/amd64 part of block-sha1/sha1.c has gcc dependency already, I
think it would be safe to assume __builtin_bswap32() is available.

But I'd want an Ack/Nack from the original authors (Cc'ed).

It seems that your patch is linewrapped, so please be careful _if_ it
needs to be modified and resent (if this version gets trivially acked I
can fix it up when applying and in such a case there is no need to
resend).

> From: Sebastian Schuberth <sschuberth@xxxxxxxxx>
> Date: Tue, 18 Aug 2009 12:33:35 +0200
> Subject: [PATCH] block-sha1: On Intel, use bswap built-in in favor of
> ntohl()/htonl()
>
> On Windows/Intel, ntohl()/htonl() are function calls that do shifts to
> swap the
> byte order. Using the native bswap instruction boths gets rid of the
> shifts and
> the function call overhead to gain some performance.
>
> Signed-off-by: Sebastian Schuberth <sschuberth@xxxxxxxxx>
> ---
>  block-sha1/sha1.c |   15 ++++++++++-----
>  1 files changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
> index f2830c0..07f2937 100644
> --- a/block-sha1/sha1.c
> +++ b/block-sha1/sha1.c
> @@ -66,15 +66,20 @@
>
>  /*
>   * Performance might be improved if the CPU architecture is OK with
> - * unaligned 32-bit loads and a fast ntohl() is available.
> + * unaligned 32-bit loads and a fast ntohl() is available. On Intel,
> + * use the bswap built-in to get rid of the function call overhead.
>   * Otherwise fall back to byte loads and shifts which is portable,
>   * and is faster on architectures with memory alignment issues.
>   */
>
> -#if defined(__i386__) || defined(__x86_64__) || \
> -    defined(__ppc__) || defined(__ppc64__) || \
> -    defined(__powerpc__) || defined(__powerpc64__) || \
> -    defined(__s390__) || defined(__s390x__)
> +#if defined(__i386__) || defined(__x86_64__)
> +
> +#define get_be32(p)	__builtin_bswap32(*(unsigned int *)(p))
> +#define put_be32(p, v)	do { *(unsigned int *)(p) =
> __builtin_bswap32(v); } while (0)
> +
> +#elif defined(__ppc__) || defined(__ppc64__) || \
> +      defined(__powerpc__) || defined(__powerpc64__) || \
> +      defined(__s390__) || defined(__s390x__)
>
>  #define get_be32(p)	ntohl(*(unsigned int *)(p))
>  #define put_be32(p, v)	do { *(unsigned int *)(p) = htonl(v); } while (0)
> -- 
> 1.6.4.169.g64d5.dirty
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html