> So the generation number really is very very fundamnetal. It's > absolutely not some "additional information that can be computed", > because the whole AND ONLY point of having the number is to not > compute it. > > We are never interested in the generation number for its own sake. We > are only interested in it in order to avoid having to look at the rest > of the DAG. You're making my point and somehow not seeing it. What you're describing here is the archetpical cache. The only reason for having a memory cache is to avoid accessing memory! The only reason for having a TLB is to avoid walking the page tables! The only reason for having a page cache is to avoid hitting the disk! The only reason for having a dcache is to avoid traversing the file system directories! And yes, the only reason for having a generation number cache is to avoid traversing the DAG. D'oh. Do you think this is somehow news to anyone? The fundamental nature of a cache is that it lets you look something up quickly that you could compute but don't want to. I'm slapping my forehead like Homer Simpson here. The fact that computing the generation number is expensive is why it's worth cacheing. But the fact that it *can* be computed is a reason not to clutter the published commit object format with it. The generation number is NOT FUNDAMENTAL. It contains no information that's not already in the DAG. The danger of putting it into a commit is that you'll do it wrong, and thereby screw everything up. If we have broken code that generates a broken cache, we fix the code and the bugs magically go away. If we have broken code that generates a broken commit object, we have a huge problem. Just like we don't ship pack indexes around, but recompute them on arrival. The index is essential for performance, but it's absolutely non-essential for correctness. As a general design principle, the exported data structures, like the commits, should be as simple as possible. Do not include extraneous or redundant data, because then you have to deal with the possibility of inconsistency. This leads to bugs. (Frequently buffer overflow bugs.) Maybe it would have been worth violating that principle during the initial git design. I still see a good argument for not doing that even if we had a time machine. But now that the commit format is established and widely used, the argument has far more force. Changing the commit format provides zero functionality gain, and the performance gain can be obtained a different way. Maybe a bit more code, but nothing extraordinary. To me, the KISS principle says "don't change the commit format!" Now, you complain about code complexity. But this is a read-only cache. The generation number of a commit object never changes. There's no update operation. Like an I-cache, if there's ever any problem, throw it away. Arguing that "the patch to put it in the commit object is smaller" is stupidly short-sighted. Now every version of git from now until forever has to support both kinds of commit objects. (And browsing old git trees will forever be slow.) You only take on that sort of legacy support burden if you absolutely have to. > But "just because we could recompute it" is a bad bad reason. Bull puckey. You're ugly and stupid and WRONG. It's an excellent reason. I'm amazed that you're not seeing it. The principle is "don't include redundant data in a transport format." Because it can be recomputed, it's redundant. Therefore, it shouldn't be included in the transport format. It's exactly the same principle as "don't store the indexes in the database dump" and "don't store filename hashes in file system archives". This is a principle, not an iron-clad rule. It can be violated for good and sufficient reasons, notably performance. But in this case, we can get the performance without it. Without, in fact, changing the git transport format at all. And "don't change a widely-used transport format" is ANOTHER important principle. Backward-compatible is much better than incompatible, but far better to avoid changing it at all. Breaking two such principles without an absolutely iron-clad reason is ugly and stupid and wrong. (As you well know, the more general principle is "don't store redundant data AT ALL unless you need to for performance". Redundant data is A Bad Thing. It can get out of sync. But if you have to, a private cache is much better than a exchange format.) Put another way, it IS stupid, it IS expendable, and therefore it SHOULD go. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html