On Fri, 20 Oct 2006, Aaron Bentley wrote: > > Linus Torvalds wrote: > > Git goes one step further: it _really_ doesn't matter about how you got to > > a certain state. Absolutely _none_ of what the commits in between the > > final stages and the common ancestor matter in the least. The only thing > > that matters is what the states at the end-point are. > > That's interesting, because I've always thought one of the strengths of > file-ids was that you only had to worry about end-points, not how you > got there. > > How do you handle renames without looking at the history? You first handle all the non-renames that just merge on their own. That takes care of 99.99% of the stuff (and I'm not exaggerating: in the kernel, you have ~21000 files, and most merges don't have a single rename to worry about - and even when you do have them, they tend to be in the "you can count them on one hand" kind of situation). Then you just look at all the pathnames you _couldn't_ resolve, and that's usually cut down the thing to something where you can literally use a lot of CPU power per file, because now you only have a small number of candidates left. If you were to use one hundredth of a second per file regardless of file, a stupid per-file merge would take 210 seconds, which is just unacceptable. So you really don't want to do that. You want to merge whole subdirectories in one go (and with git, you can: since the SHA1 of a directory defines _all_ of the contents under it, if the two branches you merge have an identical subdirectory, you don't need to do anything at _all_ about that one. See?). So instead of trying to be really fast on individual files and doing them one at a time, git makes individual files basically totally free (you literally often don't need to look at them AT ALL). And then for the few files you can't resolve, you can afford to spend more time. So say that you spend one second per file-pair because you do complex heuristics etc - you'll still have a merge that is a _lot_ faster than your 210-second one. So recursive basically generates the matrix of similarity for the new/deleted files, and tries to match them up, and there you have your renames - without ever looking at the history of how you ended up where you are. Btw, that "210 second" merge is not at all unlikely. Some of the SCM's seem to scale much worse than that to big archives, and I've heard people talk about merges that took 20 minutes or more. In contrast, git doing a merge in ~2-3 seconds for the kernel is _normal_. [ In fact, I just re-tested doing my last kernel merge: it took 0.970 seconds, and that was _including_ the diffstat of the result - not obviously not including the time to fetch the other branch over the network. I don't know if people appreciate how good it is to do a merge of two 21000-file branches in less than a second. It didn't have any renames, and it only had a single well-defined common parent, but not only is that the common case, being that fast for the simple case is what _allows_ you to do well on the complex cases too, because it's what gets rid of all the files you should _not_ worry about ] Performance does matter. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html