On Thu, Jul 14, 2011 at 1:31 PM, Jeff King <peff@xxxxxxxx> wrote: > > However, I'm not 100% convinced leaving generation numbers out was a > mistake. The git philosophy seems always to have been to keep the > minimal required information in the DAG. Yes. And until I saw the patches trying to add generation numbers, I didn't really try to push adding generation numbers to commits (although it actually came up as early as July 2005, so the "let's use generation numbers in commits" thing is *really* old). In other words, I do agree that we should strive for minimal required information. But dammit, if you start using generation numbers, then they *are* required information. The fact that you then hide them in some unarchitected random file doesn't change anything! It just makes it ugly and random, for chrissake! I really don't understand your logic that says that the cache is somehow cleaner. It's a random hack! It's saying "we don't have it in the main data structure, so let's add it to some other one instead, and now we have a consistency and cache generation problem instead". Just look at the size of the patches in question. Your caching patches are bigger and more complicated. Sure, part of it is that your series adds the code to _use_ the generation number, but look purely at the code to maintain them. Why do you think the odd separate cache is somehow better than just doing it right? Seriously? If we require the generation numbers, then they have *become* that minimal information that we should save! And I think that has served us > well, because we're not saddled with cruft that seemed like a good idea > early on, but isn't. Again - we discussed adding generation numbers about 6 years ago. We clearly *should* have done it. Instead, we went with the hacky "let's use commit time", that everybody really knew was technically wrong, and was a hack, but avoided the need. Now, six years later, you clearly are saying that we need the generation numbers, but then you go off and try to say that they should be in some secondary non-architected random collection of data structures that isn't covered by the security and maintenance guarantees that the core git objects are. Dammit, one of the things that makes git special is that the data structures are NOT random odd ad-hoc files. There is a design to them. > Generation numbers are _completely_ redundant with the actual structure > of history represented by the parent pointers. Not true. That's only true if you add ".. if you parse the whole history" to that statement. And we've *never* parsed the whole history, because it's just too expensive and doesn't scale. So right now we depend on commit dates with a few hacks. So no, generation numbers are not at all redundant. They are fundamental. It's why we had this discussion six years ago. > And so that seems a bit hack-ish to me. Um? If you feel that way, then why the hell are you pushing your EVEN MORE HACKISH CACHE PATCHES? That's what this really boils down to. I think that if we have a value that we need, then it should be recorded. In the data structures. Not in some random other location that isn't part of the real git data structures. We don't do caches in git, because we don't NEED to. Sure, gitk has it's hacky cache, but that's not core functionality. I think it's a sign of good design that we can do a "find .git" and explain every single file, and show that it's all core functionality (again, with the exception of "gitk.cache", and I suspect that's because gitk is a script, not because of any really fundamental data issues), and explain it. I think the *cache* is a hell of a lot more hacky than just doing it right. > I liken it somewhat to the "don't store renames" debate. That's total and utter bullshit. Storing renames is *wrong*. I've explained a million times why it's wrong. Doing it is a disaster. I know. I've used systems that did it. It's crap. It's fundamentally information that is actively misleading and WRONG. It's not even that you can do rename detection at run-time, it's that you *HAVE* to do rename detection at run-time, because doing it at commit time is simply utterly and fundamentally *wrong*. Just look at "git blame -C" to remind yourself why rename information is wrong. But even more importantly, look at git merges. Look at how git has gotten merging right since pretty much day #1, and has absolutely no issues with files that got generated two different ways. Look at every SCM that tries to do rename detection, and look at how THEY CANNOT DO MERGES RIGHT. It's that simple. Rename detection is not about avoiding "redundant data". It's about doing the right thing. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html