Re: [PATCH] diff-delta: produce optimal pack data

Linus Torvalds <torvalds@xxxxxxxx> · Sat, 25 Feb 2006 11:18:59 -0800 (PST)

On Sat, 25 Feb 2006, Nicolas Pitre wrote:
> 
> Yes, the hash is larger.  There is a cost in memory usage but not really 
> in CPU cycles.

Note that memory usage translates almost 1:1 (or worse) to CPU cycles in 
almost all real-life behaviours. Only in carefully tuned benchmarks does 
it not.

Increased memory usage means more paging, and worse cache behaviour. Now, 
hashes aren't wonderful for caches in the first place, but imagine the 
hump you pass when the data doesn't fit in a 64kB L1 any more (or a 256kB 
L2). Huge.

> > You'll find a lot of that in any file: three or four bytes of similarity 
> > just doesn't sound worthwhile to go digging after. 
> 
> Well after having experimented a lot with multiple parameters I think 
> they are worth it after all.  Not only they provide for optimal deltas, 
> but their hash is faster to compute than larger blocks which seems to 
> counter balance for the cost of increased hash list.

Hey, numbers talk. If you've got the numbers, I'll just shut up ;)

		Linus
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html