On 10/21/2018 9:12 PM, Junio C Hamano wrote:
Jakub Narebski <jnareb@xxxxxxxxx> writes:
So if revs->limited is set (but not because revs->topo_order is set),
which means A..B queries, we will be still using the old algorithm.
All right, though I wonder if it could be improved in the future
(perhaps with the help of other graph labelling / indices than
generation numbers, maybe a positive-cut index).
Do you have an idea why there is no improvement with the new code in
this case?
I didn't get the impression that it would not be possible to improve
the "--topo A..B" case by using generation numbers from this series.
Isn't it just because the necessary code has not been written yet?
In addition to what is needed for "--topo P1 P2 P3..." (all
positive), limited walk needs to notice the bottom boundary and stop
traversal. Having generation numbers would make it slightly easier
than without, as you know that a positive commit you have will not
be marked UNINTERESTING due to a negative commit whose ancestors
have not been explored, as long as that negative commit has a higher
generation number. But you still need to adjust the traversal logic
to properly terminate upon hitting UNINTERESTING one, and also
propagate the bit down the history, which is not needed at all if
you only want to support the "positive only" case.
Actually, the code has been written, but the problem is the same as the
performance issue when I made paint_down_to_common() use generation
numbers: the algorithm for deciding what is in the set "reachable from A
but not reachable from B" uses commit-date order as a heuristic to avoid
walking the entire graph. Yes, the revision parameters specify "limited"
in order to call "limit_list()", but it uses the same algorithm to
determine the reachable set difference.
You can test this yourself! Run the following two commands in the Git
repository using v2.19.1:
time git log --topo-order -10 master >/dev/null
time git log --topo-order -10 maint..master >/dev/null
I get 0.39s for the first call and 0.01s for the second. (Note: I
specified "-10" to ensure we are only writing 10 commits and the output
size does not factor into the time.) This is because the first walks the
entire history, while the second uses the heuristic walk to identify a
much smaller subgraph that the topo-order algorithm uses.
Just as before, by using this algorithm for the B..A case, we are adding
an extra restriction on the algorithm: always be correct. This results
in us walking a larger set (everything reachable from B or A with
generation number at least the smallest generation of a commit reachable
from only one).
I believe this can be handled by using a smarter generation number (one
that relies on commit date as a heuristic, but still have enough
information to guarantee topological relationships), and I've already
started testing a few of these directions. It is possible now that we
have concrete graph algorithms to use on real repositories. I hope to
share a report on my findings in a couple weeks. I'll include how using
this algorithm compares to the old algorithm in the B..A case.
Thanks,
-Stolee