Re: [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/21/2018 9:12 PM, Junio C Hamano wrote:
Jakub Narebski <jnareb@xxxxxxxxx> writes:

So if revs->limited is set (but not because revs->topo_order is set),
which means A..B queries, we will be still using the old algorithm.
All right, though I wonder if it could be improved in the future
(perhaps with the help of other graph labelling / indices than
generation numbers, maybe a positive-cut index).

Do you have an idea why there is no improvement with the new code in
this case?
I didn't get the impression that it would not be possible to improve
the "--topo A..B" case by using generation numbers from this series.
Isn't it just because the necessary code has not been written yet?
In addition to what is needed for "--topo P1 P2 P3..." (all
positive), limited walk needs to notice the bottom boundary and stop
traversal.  Having generation numbers would make it slightly easier
than without, as you know that a positive commit you have will not
be marked UNINTERESTING due to a negative commit whose ancestors
have not been explored, as long as that negative commit has a higher
generation number.  But you still need to adjust the traversal logic
to properly terminate upon hitting UNINTERESTING one, and also
propagate the bit down the history, which is not needed at all if
you only want to support the "positive only" case.

Actually, the code has been written, but the problem is the same as the performance issue when I made paint_down_to_common() use generation numbers: the algorithm for deciding what is in the set "reachable from A but not reachable from B" uses commit-date order as a heuristic to avoid walking the entire graph. Yes, the revision parameters specify "limited" in order to call "limit_list()", but it uses the same algorithm to determine the reachable set difference.

You can test this yourself! Run the following two commands in the Git repository using v2.19.1:

    time git log --topo-order -10 master >/dev/null

    time git log --topo-order -10 maint..master >/dev/null

I get 0.39s for the first call and 0.01s for the second. (Note: I specified "-10" to ensure we are only writing 10 commits and the output size does not factor into the time.) This is because the first walks the entire history, while the second uses the heuristic walk to identify a much smaller subgraph that the topo-order algorithm uses.

Just as before, by using this algorithm for the B..A case, we are adding an extra restriction on the algorithm: always be correct. This results in us walking a larger set (everything reachable from B or A with generation number at least the smallest generation of a commit reachable from only one).

I believe this can be handled by using a smarter generation number (one that relies on commit date as a heuristic, but still have enough information to guarantee topological relationships), and I've already started testing a few of these directions. It is possible now that we have concrete graph algorithms to use on real repositories. I hope to share a report on my findings in a couple weeks. I'll include how using this algorithm compares to the old algorithm in the B..A case.

Thanks,

-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux