Re: What's the difference between `git show branch:file | diff -u - file` vs `git diff branch file`?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 29, 2011 at 9:48 PM, Marat Radchenko <marat@xxxxxxxxxxxxxxxx> wrote:
> Nguyen Thai Ngoc Duy <pclouds <at> gmail.com> writes:
>>  - is "file" above at top repo, or is it actually very/deep/path/to/a/file?
> 3 levels deep. Most parent dir (one after repo root) contains 20k files.
>
>>  - how many entries in the tree that contain "file"?
> Sorry, didn't understand this.

You have already answered it. I was asking the size of parent dir, but
phrased poorly.

>>  - how is "git ls-files | wc -l"?
> $ time git ls-files | wc -l
> 603137
>
> real    0m0.417s
> user    0m0.440s
> sys     0m0.060s
>
>>  - how about "time git diff branch another-branch -- file >/dev/null"?
>> That'd remove unpack-trees code.
> Pretty fast:
>
> git diff HEAD branch -- file > /dev/null
>
> real    0m0.276s
> user    0m0.240s
> sys     0m0.030s

That may explain it. "git diff <ref>" walks through the index, unpacks
tree objects along the way, matches up entries with the same path from
the branch, the index then feeds matching entries to diff function. If
tree cutting is not done efficiently, it could very well walk through
every entry in the index (~600k entries in your case), unpacking all
tree objects along the way.

And it looks like to me that diff_cache() in diff-lib.c, responsible
for this case, does not do any prefix trimming. traverse_trees() also
does not seem to do "never_interesting" optimization like in
tree_interesting(), so if the traversed tree is big (~20k as you told
me), it will take some time, even though you are only interested in a
single entry.

> So the only troubled variant is `git diff branch -- file`.

No, I suspect "git diff --cached" would be also slow. "git merge"
would be definitely slow. But we can hardly improve these cases
because the commands are usually called tree-wide, no path limiting.

If you only work on a small subset of files, there's some (unfinished)
code in narrow clone implementation that cuts down index size, which
may speed up in big-index repositories like yours. Could be a good
reason for me (or someone) to extract that part and get it in before
full narrow clone is implemented.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]