Re: cygwin, 44k files: how to commit only index?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



fork0@xxxxxxxxxxx (Alex Riesen) writes:

> yes, except that it'll compare the whole trees. Could I make it stop
> at first mismatch? "-q|--quiet" for git-diff-index perhaps?
> It's just not only stat, but also, open, read, mmap (yes, I try to use
> it for packs) and close are really slow here as well.

That sounds like optimizing for a wrong case -- you expect the
index to match HEAD and trying to catch mistakes by detecting
a mismatch, right?

Having said that, I should point out that it is a low hanging
fruit to optimize "diff-index --cached" for cases where index
is expected to mostly match HEAD.

The current code for "diff-index --cached" reads the whole tree
into the index as stage #1 entries (diff-lib.c::run_diff_index),
and then compares stage #0 (from the original index contents)
and stage #1 (the tree parameter from the command line).  Even
if you stop at the first mismatch, you would already have paid
the overhead to open and read all tree objects before even
starting the comparison.

However, this code is from the ancient time before cache-tree
was introduced in the index.  If the index is expected to mostly
match HEAD, most of the cache-tree nodes are up-to-date, and
whole subtree can be skipped with a single comparison between
two tree SHA-1s at a shallower level of the directory tree.

In 'pu' (jc/diff topic), I have a very generic code to walk the
index, working tree and zero or more trees in parallel, taking
advantage of cache-tree.  If somebody is interested to learn the
internals of git, some of the code could be lifted from there
and simplified to walk just the index and a single tree, and I
think that would optimize "diff-index --cached" quite a bit.

A very unscientific test of running in the kernel repository I
just pulled (hot cache) on my box is:

$ /usr/bin/time git diff-index -r --cached --abbrev v2.6.19 >/tmp/1
0.91user 0.20system 0:01.12elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+10949minor)pagefaults 0swaps

while the para-walk to produce the moral equivalent is:

$ /usr/bin/time test-para --no-work v2.6.19 >/tmp/2
0.11user 0.02system 0:00.13elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+4524minor)pagefaults 0swaps

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]