On Sun, Nov 30, 2014 at 12:54:53PM +1100, Bryan Turner wrote: > I'll let someone a little more intimately familiar with the internals > of git status comment on why the documentation for that mentions > copies. I don't think there is a good reason. git-status has used renames since mid-2005. The documentation mentioning copies was added much later, along with the short and porcelain formats. That code handles whatever the diff engine throws at it. I don't think anybody considered at that time the fact that you cannot actually provoke status to look for copies. Interestingly, the rename behavior dates all the way back to: commit 753fd78458b6d7d0e65ce0ebe7b62e1bc55f3992 Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxx> Date: Fri Jun 17 15:34:19 2005 -0700 Use "-M" instead of "-C" for "git diff" and "git status" The "C" in "-C" may stand for "Cool", but it's also pretty slow, since right now it leaves all unmodified files to be tested even if there are no new files at all. That just ends up being unacceptably slow for big projects, especially if it's not all in the cache. I suspect that the copy code may be much faster these days (it sounds like we did not even have the find-copies-harder distinction then, and these days we certainly take the quick return if there are no copy destination candidates). To get a rough sense of how much effort is entailed in the various options, here are "git log --raw" timings for git.git (all timings are warm cache, best-of-five, wall clock time): log --raw: 0m2.311s log --raw -M: 0m2.362s log --raw -C: 0m2.576s log --raw -C -C: 1m4.462s You can see that rename detection adds a little, and copy detections adds a little more. That makes sense; it's rare for new files to appear at the same that old files are going away (renames), so most of the time it does nothing. Copies introduce a bit more work; we have to compare against any changed files, and there are typically several in each commit. find-copies-harder is...well, very expensive. These timings are of diffs between commits and their parents, of course. But if we assume that "git status" will show diffs roughly similar to what gets committed, then this should be comparable. There are about 30K non-merge commits we traversed there, so adding 200ms is an average of not very much per commit. Of course the cost is disproportionately borne by diffs which have an actual file come into being. There are ~2000 commits that introduce a file, so it's probably accurate to say that it either adds nothing in most cases, or ~1/10th of a millisecond in others. Note this is also doing inexact detection, which involves actually looking at the contents of candidate blobs (whereas exact detection can be done by comparing sha1s, which is very fast). If you set diff.renamelimit to "1", then we do only exact detections. Here are timings there: log --raw: 0m02.311s (for reference) log --raw -M: 0m02.337s log --raw -C: 0m02.347s log --raw -C -C: 0m24.419s That speeds things up a fair bit, even for "-C" (we don't have to access the blobs anymore, so I suspect the time is going to just accessing all of the trees; normally diff does not descend into subtrees that have the same sha1). Of course, you probably wouldn't want to turn off inexact renames completely. I suspect what you'd want is a --find-copies-moderately where we look for cheap copies using "-C", and then follow up with "-C -C" only using exact renames. So from these timings, I'd conclude that: 1. It's probably fine to turn on copies for "git status". 2. It's probably even OK to use "-C -C" for some projects. Even though 22s looks scary there, that's only 11ms for git.git (remember, spread across 2000 commits). For linux.git, it's much, much worse. I killed my "-C -C" run after 10 minutes, and it had only gone through 1/20th of the commits. Extrapolating, you're looking at 500ms or so added to a "git status" run. So you'd almost certainly want this to be configurable. Does either of you want to try your hand at a patch? Just enabling copies should be a one-liner. Making it configurable is more involved, but should also be pretty straightforward. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html