On Mon, Aug 29, 2011 at 9:48 PM, Marat Radchenko <marat@xxxxxxxxxxxxxxxx> wrote: > Nguyen Thai Ngoc Duy <pclouds <at> gmail.com> writes: >> - is "file" above at top repo, or is it actually very/deep/path/to/a/file? > 3 levels deep. Most parent dir (one after repo root) contains 20k files. > >> - how many entries in the tree that contain "file"? > Sorry, didn't understand this. You have already answered it. I was asking the size of parent dir, but phrased poorly. >> - how is "git ls-files | wc -l"? > $ time git ls-files | wc -l > 603137 > > real 0m0.417s > user 0m0.440s > sys 0m0.060s > >> - how about "time git diff branch another-branch -- file >/dev/null"? >> That'd remove unpack-trees code. > Pretty fast: > > git diff HEAD branch -- file > /dev/null > > real 0m0.276s > user 0m0.240s > sys 0m0.030s That may explain it. "git diff <ref>" walks through the index, unpacks tree objects along the way, matches up entries with the same path from the branch, the index then feeds matching entries to diff function. If tree cutting is not done efficiently, it could very well walk through every entry in the index (~600k entries in your case), unpacking all tree objects along the way. And it looks like to me that diff_cache() in diff-lib.c, responsible for this case, does not do any prefix trimming. traverse_trees() also does not seem to do "never_interesting" optimization like in tree_interesting(), so if the traversed tree is big (~20k as you told me), it will take some time, even though you are only interested in a single entry. > So the only troubled variant is `git diff branch -- file`. No, I suspect "git diff --cached" would be also slow. "git merge" would be definitely slow. But we can hardly improve these cases because the commands are usually called tree-wide, no path limiting. If you only work on a small subset of files, there's some (unfinished) code in narrow clone implementation that cuts down index size, which may speed up in big-index repositories like yours. Could be a good reason for me (or someone) to extract that part and get it in before full narrow clone is implemented. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html