On Tue, Dec 02, 2014 at 09:57:07AM -0800, Junio C Hamano wrote: > > To get a rough sense of how much effort is entailed in the various > > options, here are "git log --raw" timings for git.git (all timings are > > warm cache, best-of-five, wall clock time): > > The rationale of the change talks about "big projects" and your > analysis and argument to advocate reversion of that change is based > on "git.git"? What is going on here? I find that git.git is often a useful and easy thing to time to extrapolate to other projects. It's 1/10th-1/20th the size of the kernel (both in tree size and commit depth), which I do consider a "big project" (and I have a feeling is what Linus was talking about). Of course, performance numbers do not always scale linearly with repo size. I didn't show the full numbers for the kernel, but they are: log --raw: 0m53.587s log --raw -M: 0m55.424s log --raw -C: 1m02.733s log --raw -C -C: <killed after 10 minutes> There are ~20K commits that introduce files in the kernel (about 10x what git.git had). So renames add well under a millisecond to each of those diffs, and a single "-C" adds a third of a millisecond. Which is pretty in-line with the git.git findings (it is not linear here, but actually fairly constant. This makes sense, as it scales with the size of the commit, not the size of the tree). And as I noted, "-C -C" is rather expensive (I gave some estimated timings earlier; you could probably come up with something more accurate by doing smarter sampling). > Also our history is fairly unusual in that we do not have a lot of > renames (there was one large "s|builtin-|builtin/|" rename event, > but that is about it), which may not be a good example to base such > a design decision on. I think the work scales not with the number of actual renames, but with the number of commits where we even bother to look at renames at all (i.e., ones with an 'A' diff-status). And my estimates assume that we pay zero cost for other diffs, and attribute all of the extra time to those diffs. So I think frequency of rename (or 'A') events does not impact the estimate of the impact on a single "git status" run. What does impact it is the _size_ of each commit. If you quite frequently add a new file while touching tens of thousands of other files, then the performance will be a lot more noticeable. And both git.git and linux.git are probably better than some other projects about having small commits. Still, though. I stand by my earlier conclusions. Even with commits ten times as large as the kernel's, you are talking about 3ms added to a "status" run (and again, only when you hit such a gigantic commit _and_ it has an 'A' file). > It is a fine idea to make this configurable ("status.renames = -C" > or something, perhaps?), though. I think it would be OK to move to "-C" as a default, but I agree it is nicer if it is configurable, as it gives a safety hatch for people in pathological repos to drop back to the old behavior (or even turn off renames altogether). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html