Junio C Hamano <junkio@xxxxxxx> wrote: > "Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes: > > > If a project has a really huge number of tags (such as several > > thousand tags) then we are likely to have nearly a hundred tags in > > some buckets. Scanning those buckets as linked lists could take > > a large amount of time if done repeatedly during history traversal. > > I think this is a good idea -- but would you still need the 256 > buckets if you do this? Well, yes and no. The binary search would make it O(log N) rather than O(N), which is clearly an improvement. But we're hashing by the commit SHA1 and since SHA1 does such a good job of distributing values around we get pretty even distribution among the buckets. This gets us a 1/256 reduction within each bucket, at the cost of just one array index operation. Since the buckets are so small (1 pointer) and the match function is in the critical path of describe it would seem reasonable to take the small memory hit in return for better behavior on very large sets of tags. It certainly doesn't hurt on smaller tag sets (like git.git), we wind up with 1 tag per bucket and our lookups are constant time. -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html