[RFH] git cherry vs. git rev-list --cherry, or: Why does "..." suck?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In the process of converting "git cherry" and "git format patch" to use
the new rev-list options (the saner way according to d7a17ca (git-log
--cherry-pick A...B, 2007-04-09) already!), I have a simple question and
a hard one which I both ask help for:

run_command
===========

I could use either run_command_v_opt(args, RUN_GIT_CMD) or setup the
walker, call it etc. For the former I have to check how to treat the
third argument to "git cherry", the latter seems to be more code (and I
would need to call the rev-list/log output loop somehow).

Is there a general preference for using or avoiding run_command?

(There's also the question of what details of git cherry's output I need
to preserve.)


Performance
===========

I don't get this:

git cherry A B: 0.4s
git rev-list --cherry A...B: 1.7s
(more details below)

This makes "rev-list --cherry" almost unacceptable as a replacement. But
I'd like to understand this difference (and maybe do something about
it). I'm lost with gprof, but here are more details on the timing:

A is pu at 0f169fc
B is next at 5ddab49 plus three commits which are not upstream

rev-list --count 5ddab49..A is 166 (117 without merges), for B it is 3

Now the timings (rev-list done with --count):

cherry A B: 0.4s
cherry B A: 0.4s
rev-list --cherry A...B: 1.7s

The latter computes merge bases (there are 25), the former does not. How
much is it:

merge-base A B: 0.95s
merge-base --all A B: 0.95s
rev-parse A...B: 0.95s

So this accounts for much of the difference (and we need to do something
about get_merge_bases()), but not all. How much is the patch-id computation:

rev-list --no-merges --right-only --cherry-pick A...B: 1.7s
(the above is --cherry)
rev-list --no-merges --right-only A...B: 1.0s
rev-list --no-merges --left-right A...B: 1.0s

Why does it take rev-list 0.7s to do the same patch-id computations that
cherry does in less than 0.4s? (More details on what they do below.)

rev-list --no-merges A..B: 0.04s (counting to 3, yeah)
rev-list --no-merges A..B: 0.6s (counting to 117)

The latter has no patch-id nor merge computation. Should this really
take 0.6s?

I'm stomped. Help, please!

Michael

What the commands roughly do:

cherry A B [limit]:
===================
add pending B ^A
walk B..A (on temp rev_info) and
add_commit_patch_id() on these
clear_commit_marks()
add pending ^limit if specified
walk A..B and
reverse that list and
has_commit_patch_id() on these

rev-list --cherry A...B:
========================
get_merge_bases for A,B
add pending --not merge bases
add pending A B
add_commit_patch_id() on smaller side
has_commit_patch_id() on other side (&& mark id seen)
recheck smaller side (based on id->seen)

This seems to enumerate A..B and B..A more often, but is iterating
through a commit list that time consuming? The number of patch-id
computations is the same as far as I can see.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]