Re: x86 asm SHA1 (draft)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> OK.  I somehow got an impression that your two versions had
> quite different performance characteristics on G4 and G5 and
> there was a real choice.  If they are between a few per-cent,
> then I agree it is not worth doing at all.

My apologies for being unclear.

The place where a noticeable (if not disastrous) difference can appear
is x86, which has a lot more models with "interesting" performance
characteristics.  In particular, Intel is fond of building CPUs with a
very small "sweet spot".

The openssl SHA1 code had to be reworked to not suck on a P4, with the
resultant performance change:

#               compared with original  compared with Intel cc
#               assembler impl.         generated code
# Pentium       -16%                    +48%
# PIII/AMD      +8%                     +16%
# P4            +85%(!)                 +45%

The original code had the most popular round (what I call
ROUND_MIX(F2,...))) implemented as follows, with single-uop
instructions (no load+op) scheduled for the Pentium pipeline:
(A..E are working variables, S and T are temps)

	movl    16(%esp),S	U  \
        movl    24(%esp),T	 V  \
        xorl    S,T		U    \
        movl    48(%esp),S	 V    > "MIX", pentium-optimized
        xorl    S,T		U    /
        movl    4(%esp),S	 V  /
        xorl    S,T		U  /
        movl	B,S		 V
	roll	$1,T		U	Rotate of mix (SHA0 -> SHA1 fix)
	xor	C,S		 V
	mov	T,16(%esp)	U	Store back W[i]
	xor	D,S		 V	Finish computing F(B,C,D) = B^C^D
	lea	K(T,E),E	U	Add K and W[i] to E
	mov	A,T		 V
	roll	$5,T		UV
	rorl	$1,B		U
	add	S,E		 V
	rorl	$1,B		U
	add	T,E		 V

While the P4-optimized version goes:
	movl	B,S
	movl	16(%esp),T
	rorl	$2,B
	xorl	24(%esp),T
	xorl	C,S
	xorl	48(%esp),T
	xorl	D,S		This is F(B,C,D) = B^C^D
	xorl	4(%esp),T
	roll	$1,T		Rotate of mix (SHA0 -> SHA1 fix)
	addl	S,E
	movl	T,16(%esp)
	movl	A,S
	roll	$5,S
	lea	K(E,T),E
	add	S,E

(The original code actually rotates the working variables around 6
registers, not 5, but I've rearranged the last couple of instructions
to rotate around 5.)
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]