Re: [PATCH 3/4] Use binary searching on large buckets in git-describe.

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Sun, 14 Jan 2007 18:07:21 -0500

Junio C Hamano <junkio@xxxxxxx> wrote:
> "Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes:
> 
> > If a project has a really huge number of tags (such as several
> > thousand tags) then we are likely to have nearly a hundred tags in
> > some buckets.  Scanning those buckets as linked lists could take
> > a large amount of time if done repeatedly during history traversal.
> 
> I think this is a good idea -- but would you still need the 256
> buckets if you do this?

Well, yes and no.  The binary search would make it O(log N) rather
than O(N), which is clearly an improvement.  But we're hashing by
the commit SHA1 and since SHA1 does such a good job of distributing
values around we get pretty even distribution among the buckets.
This gets us a 1/256 reduction within each bucket, at the cost of
just one array index operation.

Since the buckets are so small (1 pointer) and the match function
is in the critical path of describe it would seem reasonable to
take the small memory hit in return for better behavior on very
large sets of tags.  It certainly doesn't hurt on smaller tag sets
(like git.git), we wind up with 1 tag per bucket and our lookups
are constant time.

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html