On Jul 14, 2011, at 21:19, Linus Torvalds wrote: > But dammit, if you start using generation numbers, then they *are* > required information. The fact that you then hide them in some > unarchitected random file doesn't change anything! It just makes it > ugly and random, for chrissake! Generation numbers never will be required information, because we can always compute them. These numbers are really much more similar to other pack index information than anything else. <aside> Sometimes I wish we'd have general "depth" information for each SHA1, which would be the maximum number of steps in the DAG to reach a leaf. This way, if we want to do something like "git log drivers/net/slip.c", we don't have to bother reading the majority of trees that have a depth less than two. The depth can also be used as a limiter for "contains" operations, where we want to see if commit X contains commit Y: depth (X) has to be at least depth (Y). However, any such notion, wether generation or depth or whatever else we'll think of tomorrow, is something particular to a certain implementation of git. It does not add anything to the information we stored. </aside> I don't think my commit should have a different SHA1 from yours, because your tree has a more generation numbers than mine. The beauty and genius of GIT is that it just takes the minimum amount of data needed to uniquely identify the information to be stored, and stores that in a UNIQUE format. By allowing generation numbers to either be present or absent, that's all broken. It's like computing the SHA1 of compressed data: it doesn't depend on the data we store, just about the particular representation we choose. Fortunately we have done away with the first mistake. So, if you're going to add generation numbers, there has to be a flag day, after which generation numbers are required everywhere. Of course it would be possible to recognize "old style" commits and convert them on the fly, but that is true for pretty much any format change. However, adding redundant information seems like a poor excuse for having a flag day. Storing generation data in pack indices on the other hand makes perfect sense: when we generate these indices, we do complete traversals and have all required information trivially at hand. We can never have that many loose objects, so lack of generation information there isn't a big deal. By storing generation information in the index, we can be sure it is consistent with the data contained in the pack, so there are no cache invalidation issues. I know I must have missed some stupid and obvious reason why this is all wrong, I just don't quite see it yet. -Geert -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html