On 12/15/2017 1:30 PM, Junio C Hamano wrote:
Derrick Stolee <stolee@xxxxxxxxx> writes:
The biggest reason for the 20 seconds is not just the number of
commits in the ahead/behind but how many commits are walked (including
common to both branches) before paint_down_to_common() breaks its
while loop due to queue_has_nonstale().
Hmm, queue_has_nonstale() looks to see if any element is not STALE
(where the definition of STALE is "known to be a common ancestor")
by potentially checking all elements in the queue. I wonder if we
have an opportunity for a trivial optimization? When the caller
knows that it dug one level and added the parents that are not
stale, it does not have to ask queue_has_nonstale() if there is any
non stale element, for example.
I thought this, too, but my tracing did not show significant time spent
in this method. 99% of the time is spent unzipping and parsing commits.
If this was taking too long, then we could track a minimum timestamp for
a commit that entered the queue in a non-stale state, but this will
delay the termination condition a bit since commits can be marked stale
after they enter the queue.
What do you exactly mean by "not just the number of commits in the
ahead/behind"? Aren't the number of these commits pretty much
proportional to the number of commits we need to paint down to
common ancestors? Is the latter a lot larger than the former
(i.e. are we somehow not stopping when we _could_ notice that we
can with better information)?
With the wide history, there is actually a large set of commits that are
in the common history but have newer commit times than the oldest commit
in only one history. Consider the following ASCII art:
A
|
1
|
2
|
3
|\
4 B
|\|
5 C
|\|
6 D
|\|
.
.
.
|\|
N Z
|/
0
Between A and B, A is ahead by commits {A,1,2,3,4,5,6,...,N}. Meanwhile,
commits B,C,D,...,Z are in the common history, but still must be walked.
Now imagine these two sets are actually much MUCH wider (thousands of
commits that are pairwise non-ancestors). This causes the number of
walked commits to be larger than just the number of commits in the
symmetric difference of the histories.
Unfortunately, generation numbers are not the only optimization needed
to make this call be sub-100ms. A graph data structure that lists the
edges between commits would prevent the time spent in zlib decompressing
and parsing commits. I'm working on investigating how such a data
structure (and corresponding file format) could integrate in the
commit-walking code.
Thanks,
-Stolee