On Thu, Jul 14, 2011 at 11:37 AM, Jeff King <peff@xxxxxxxx> wrote: > > I'd love to have in-commit generation numbers. I'm just not sure we can > get the speeds we want without caching them for existing commits. So my argument would be that we'd simply be much better off fixing the fundamental data structure (which we can), and let it become the long-term solution. Now, if *may* turn out that we'd want to have some cache for generation numbers in commits that don't have them, but I absolutely think that that should be a "add-on" rather than anything fundamental. For example, if we just merge the "add generation numbers to the commit object" logic first, then the "cache" case never really needs to care about us generating new commits. They simply won't need the cache. Also, I suspect that the cache could easily be done as a *small* and *incomplete* cache, ie you don't need to cache all commits, it would be sufficient to cache a few hundred spread-out commits, and just know that "from any commit, the cached commit will be quickly reachable". > I'm not sure that is the best plan. Calculating generation numbers > involves going to all roots. So once you have to find any generation > number, it's going to be expensive, no matter how many recent commits > have generation numbers already in them (but it won't get _more_ > expensive as more commits are added; you'll always be traversing from > the commit in question down to the roots). It only ends up being expensive if the commit has parents that don't have generation numbers. That's a fairly short-term problem. For the kernel, for example, basically no development happens on a base that is older than one or two releases. So if I (and Greg, with the stable tree) start using my patch, within a couple of weeks, pretty much all development would have a generation number in its history. Sure, sometimes I'd merge from people who based their tree on something old, and I'd end up calculating it all. But it would get progressively rarer. > As we add new commits with generation numbers, we won't need to do a > calculation to get their numbers. But if you are doing something like > "tag --contains", you are going to want to know the generation number of > old tags (otherwise, you can't know whether your cutoff might hit them > or not). IOW, even if we add generation numbers _today_, every "tag > --contains" in linux-2.6 is going to end up traversing from v3.0-rc7 > down to the roots to get its generation number (v3.0-rc8 would get an > embedded generation, of course). So that could easily be handled by caching. In fact, I suspect that you could make the cache no associate with a commit ID, but be associated with the tags and heads. But again, then the cache would be a "secondary" issue, not something fundamental. > So if you aren't going to cache generation numbers, then you might as > well write your traversal algorithm to assume you don't know them for > old commits. But that's how our algorithms are *already* written. So why not have that as the fallback? You get the advantage of generation numbers only with modern things, but those are the ones you actually tend to use. Merge bases are *very* seldom historical, for example. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html