Re: What's cooking in git.git (topics)

Jeff King <peff@xxxxxxxx> · Tue, 2 Oct 2007 13:39:41 -0400

On Tue, Oct 02, 2007 at 06:31:18PM +0200, David Kastrup wrote:

> This does not actually require an actual merge _sort_ AFAICS: do the
> "sort file.hashed" step using qsort.  The comparison step does not
> actually need to produce merged output, but merely advances through
> two hash arrays and generates statistics.

Right, that's why I used "merge" in quotes. The sort used in the O(n)
step is irrelevant, but we are doing a merge-sort-like behavior in the
second step (except instead of actually merging into a new list, we are
summarizing the comparisons in a numeric "difference" variable). But I
think we are on the same page.

> This should already beat the pants off the current implementation,
> even when the hash array is sparse, simply because our inner loop then
> has perfect hash coherence.

Yes, I hope so. We'll see. :)

> Getting rid of this outer O(n^2) remains an interesting challenge,
> though.  One way would be the following: fill a _single_ array with
> entries containing _both_ hash and file number.  Sort this, and then
> gather the statistics of hash runs by making a single pass through.
> That reduces the O(n^2) behavior to only those parts with actual hash
> collisions.

Interesting. Care to take a stab at implementing it?

-Peff
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html