On Wed, 6 Jul 2011, Jeff King wrote: > On Wed, Jul 06, 2011 at 11:01:03AM -0400, Ted Ts'o wrote: > > > Is it worth it to try to replicate this information across repositories? > > Probably not. I suggested notes-cache just because the amount of code is > very trivial. Well, generation numbers are universal and would help everybody. For new commits with 'generation' header those would be always replicated, for old commits with 'generation' notes / notes-cache the can be replicated. > One problem with notes storage is that it's not well optimized for tiny > pieces of data like this (e.g., the generation number should fit in a > 32-bit unsigned int, as its max is the size of the longest single path > in the history graph). But notes are much more general; we will actually > map each commit to a blob object containing the generation number, which > is pretty wasteful. Wasn't textconv-cache using commit-less notes? The same can be done for generation notes-cache. Though it is still wasteful... By the way, would we be using text representation (like in 'generation' commit header) or 32-bit integer binary representation in some ordering, or variable-length integer (I think git uses them somewhere)? Nb. I wonder if 32-bit unsigned int would always be enough, for example Linux kernel + history. > > Why not just simply have a cache file in the git directory which is > > managed somewhat like gitk.cache; call it generation.cache? > > Yeah, that would be fine. With a sorted list of binary sha1s and 32-bit > generation numbers, you're talking about 24 bytes per commit. Or a 6 > megabyte cache for linux-2.6. > > You'd probably want to be a little clever with updates. If I have > calculated the generation number of every commit, and then do "git > commit; git tag --contains HEAD", you probably don't want to rewrite the > entire cache. You could probably journal a fixed number of entries in an > unsorted file (or even in a parallel directory structure to loose > objects), and then periodically write out the whole sorted list when the > journal gets too big. Or choose a more clever data structure that can do > in-place updates. And that is the difference between gitk.cache (generated _once_ when starting gitk, and regenerated on request), and idea of generation.cache I think it would be simpler to use generation header + generation notes. Or start with generation notes only. -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html