Re: What's cooking in git.git (topics)

David Kastrup <dak@xxxxxxx> · Wed, 03 Oct 2007 10:20:49 +0200

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> [ This is the discussed stupid approach - just sort the dang hash array, 
>   so that we can use a linear scan over the src/dst ]
>
> On Tue, 2 Oct 2007, David Kastrup wrote:
>> 
>> This does not actually require an actual merge _sort_ AFAICS: do the
>> "sort file.hashed" step using qsort.  The comparison step does not
>> actually need to produce merged output, but merely advances through
>> two hash arrays and generates statistics.
>> 
>> This should already beat the pants off the current implementation,
>> even when the hash array is sparse, simply because our inner loop then
>> has perfect hash coherence.
>
> Sadly, that's not the case. It *does* seem to beat the current 
> implementation, but it's not "beat the pants off".

Part of the reason is that it is not actually what I had in mind.  Why
create the hash array as a hash array?  Filling the hash array in
basically random order, then sort+compressing it is what is causing
much of the costs.  My idea was to just fill the "hash array"
linearly.  It is quite pointless (and certainly very inefficient with
regard to cache poisoning) to do it in hash order when we are going to
sort it anyway.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html