Re: generation numbers (was: [PATCH 0/4] Speed up git tag --contains)

Jeff King <peff@xxxxxxxx> · Wed, 6 Jul 2011 14:12:00 -0400

On Wed, Jul 06, 2011 at 11:01:03AM -0400, Ted Ts'o wrote:

> Is it worth it to try to replicate this information across repositories?

Probably not. I suggested notes-cache just because the amount of code is
very trivial.

One problem with notes storage is that it's not well optimized for tiny
pieces of data like this (e.g., the generation number should fit in a
32-bit unsigned int, as its max is the size of the longest single path
in the history graph). But notes are much more general; we will actually
map each commit to a blob object containing the generation number, which
is pretty wasteful.

> Why not just simply have a cache file in the git directory which is
> managed somewhat like gitk.cache; call it generation.cache?

Yeah, that would be fine. With a sorted list of binary sha1s and 32-bit
generation numbers, you're talking about 24 bytes per commit. Or a 6
megabyte cache for linux-2.6.

You'd probably want to be a little clever with updates. If I have
calculated the generation number of every commit, and then do "git
commit; git tag --contains HEAD", you probably don't want to rewrite the
entire cache. You could probably journal a fixed number of entries in an
unsorted file (or even in a parallel directory structure to loose
objects), and then periodically write out the whole sorted list when the
journal gets too big. Or choose a more clever data structure that can do
in-place updates.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html