Re: [PATCH 0/1] v2.19.0-rc1 Performance Regression in 'git merge-base'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> As I was testing the release candidate, I stumbled across a regression in
> 'git merge-base' as a result of the switch to generation numbers. The commit
> message in [PATCH 1/1] describes the topology involved, but you can test it
> yourself by comparing 'git merge-base v4.8 v4.9' in the Linux kernel. The
> regression does not show up when running merge-base for tags at least v4.9,
> which is why I didn't see it when I was testing earlier.

Strange that it happens; I'll take a look.

> The solution is simple, but also will conflict with ds/reachable in next. I
> can send a similar patch that applies the same diff into commit-reach.c.
>
> With the integration of generation numbers into most commit walks coming to
> a close [1], it will be time to re-investigate other options for
> reachability indexes [2]. As I was digging into the issue with this
> regression, I discovered a way we can modify our generation numbers and pair
> them with commit dates to give us a simple-to-compute, immutable
> two-dimensional reachability index that would be immune to this regression.
> I will investigate that more and report back, but it is more important to
> fix this regression now.

I am interested in what you have created, especially because commit
creation dates are imperfect indicators because of clock skew etc.

>
> Thanks, -Stolee
>
> [1] https://public-inbox.org/git/pull.25.git.gitgitgadget@xxxxxxxxx/[PATCH
> 0/6] Use generation numbers for --topo-order
>
> [2] https://public-inbox.org/git/86muxcuyod.fsf@xxxxxxxxx/[RFC] Other chunks
> for commit-graph, part 2 - reachability indexes

Since then I have found few more possible approaches:
- working with repository metagraph[1], where all chains of commits were
  replaced by edge, perhaps together with DAG Reduction[2] boosting
  framework on this metagraph
- using FELINE-like index (the Graph+Label approch, also known as online
  search), and for those where this index have false results, use Labels
  only approach[3]

[1] Xian Tang et. al., "An Optimized Labeling Scheme for Reachability
    Queries", CMC, vol.55, no.2, pp.267-283, 2018
[2] Marco Biazzini, Martin Monperrus, Benoit Baudry "On Analyzing the
    Topology of Commit Histories in Decentralized Version Control
    Systems", ICSME 2014 (conference paper)
[3] Junfeng Zhou et. al., "Accelerating reachability query processing
    based on DAG reduction", The VLDB Journal (2018) 27: 271

--
Jakub Narębski




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux