Re: [PATCH] block-sha1: Windows declares ntohl() in winsock2.h

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sebastian Schuberth wrote:
> As ntohl()/htonl() are function calls (that internally do shifts), I
> doubt they're faster than the shift macros, though I haven't measured
> it. However, I do not suggest to go for the macros on Windows/Intel, but
> to apply the following patch on top of your patch:

> On Windows/Intel, ntohl()/htonl() are function calls that do shifts to
> swap the
> byte order. Using the native bswap instruction boths gets rid of the
> shifts and
> the function call overhead to gain some performance.

Umm, nothing like this should be needed on linux; the compiler/glibc
will choose bswap itself. (see endian.h and bits/byteswap.h).
I did try using __builtin_bswap32 directly and the result was a few
(3 or 4, iirc) differently scheduled instructions, that's all, no
performance difference.

>   * Performance might be improved if the CPU architecture is OK with
> - * unaligned 32-bit loads and a fast ntohl() is available.
> + * unaligned 32-bit loads and a fast ntohl() is available. On Intel,
> + * use the bswap built-in to get rid of the function call overhead.
>   * Otherwise fall back to byte loads and shifts which is portable,
>   * and is faster on architectures with memory alignment issues.
>   */
> 
> -#if defined(__i386__) || defined(__x86_64__) || \
> -    defined(__ppc__) || defined(__ppc64__) || \
> -    defined(__powerpc__) || defined(__powerpc64__) || \
> -    defined(__s390__) || defined(__s390x__)
> +#if defined(__i386__) || defined(__x86_64__)
>
> +#define get_be32(p)    __builtin_bswap32(*(unsigned int *)(p))
> +#define put_be32(p, v)    do { *(unsigned int *)(p) = __builtin_bswap32(v); } while (0)
> +
> +#elif defined(__ppc__) || defined(__ppc64__) || \
> +      defined(__powerpc__) || defined(__powerpc64__) || \
> +      defined(__s390__) || defined(__s390x__)

I'd limit it to windows and any other ia32 platform that doesn't pick the
bswaps itself; as is, it just adds an unnecessary hidden gcc dependency.

Hmm, it's actually a gcc-4.3+ dependency, so it won't even build w/ gcc 4.2;
something like this would be required: "(__GNUC__>=4 && __GNUC_MINOR__>=3)" .

artur
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]