Re: git-diff-tree inordinately (O(M*N)) slow on files with many changes

Linus Torvalds <torvalds@xxxxxxxx> · Mon, 16 Oct 2006 16:24:05 -0700 (PDT)

On Mon, 16 Oct 2006, Junio C Hamano wrote:
> 
> I agree (although I am not sure about the "do it twice for
> small" bit), and I think Davide agrees with you in his reply:

Sure. Davide's all-macro version is fine. I don't like re-using the same 
value twice even in a ALL-CAPS macro, so I'm used to inline functions, but 
all the uses of XDL_HASHLONG() are fine with multiple uses of the 
arguments.

Somebody should just double-check that all the parentheses ended up being 
right ;)

It might be easier to read if you write it as

	#define BITS_IN_LONG	(CHAR_BIT * sizeof(unsigned long))
	#define XDL_HIGHBITS(v,b) ((v) >> (BITS_IN_LONG - (b)))
	#define XDL_MASKBITS(b) ((1UL << (b)) - 1)
	#define XDL_HASHBITS(v,b) (((v) + XDL_HIGHBITS(v,b)) & XDL_MASKBITS(b))
	#define XDL_HASHLONG(v,b) XDL_HASHBITS( (unsigned long)(v) , b )

just to avoid one huge #define.

That said, it unnecessarily calculates "BITS_IN_LONG - (b)" to shift with, 
because it really shouldn't matter _which_ high bits you use for hashing, 
so you might as well just use the "next" b bits, and have

	#define XDL_ADDBITS(v,b)	((v) + ((v) >> (b)))
	#define XDL_MASKBITS(b)		((1UL << (b)) - 1)
	#define XDL_HASHLONG(v,b)	(XDL_ADDBITS((unsigned long)(v), b) & XDL_MASKBITS(b))

which generates better code at least on x86 (and x86-64), because the 
shift count stays the same for all shifts and can thus be kept in %ecx. 
For example, on x86-64, you get

	movq    %rdi, %rax		# copy 'val'
	movl    $1, %edx		# const 1: start generating (1 << b) - 1
	shrq    %cl, %rax		# val >> b
	salq    %cl, %rdx		# 1 << b
	leaq    (%rdi,%rax), %rax	# val + (val >> b)
	subq    $1, %rdx		# (1 << b) -1
	andq    %rdx, %rax		# final hash

which is short and sweet. And on ppc32 (or ppc64) you get

	li 9,1			# const 1: start generating (1 << b) - 1
	srw 0,3,4		# val >> b
	slw 9,9,4		# 1 << b
	add 0,0,3		# val + (val >> b)
	addi 9,9,-1		# (1 << b) - 1
	and 3,0,9		# final hash

in other words, apart from having two shifts (which you can't really 
avoid, although a multiply can do one of them) it's just a very efficient 
way to mix together (2*b) bits into a (b)-bit hash.

But taking the high bits from the "unsigned long" doesn't add _that_ much 
cost. I just suspect that it's a good way to continue to get different 
answers on 32-bit and 64-bit architectures. 

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html