Re: [PATCH v3 1/4] connected: do not sort input revisions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 02 2021, Patrick Steinhardt wrote:

> [[PGP Signed Part:Undecided]]
> In order to compute whether objects reachable from a set of tips are all
> connected, we do a revision walk with these tips as positive references
> and `--not --all`. `--not --all` will cause the revision walk to load
> all preexisting references as uninteresting, which can be very expensive
> in repositories with many references.
>
> Benchmarking the git-rev-list(1) command highlights that by far the most
> expensive single phase is initial sorting of the input revisions: after
> all references have been loaded, we first sort commits by author date.
> In a real-world repository with about 2.2 million references, it makes
> up about 40% of the total runtime of git-rev-list(1).
>
> Ultimately, the connectivity check shouldn't really bother about the
> order of input revisions at all. We only care whether we can actually
> walk all objects until we hit the cut-off point. So sorting the input is
> a complete waste of time.

Really good results:

> Introduce a new "--unsorted-input" flag to git-rev-list(1) which will
> cause it to not sort the commits and adjust the connectivity check to
> always pass the flag. This results in the following speedups, executed
> in a clone of gitlab-org/gitlab [1]:
>
>     Benchmark #1: git rev-list  --objects --quiet --not --all --not $(cat newrev)
>       Time (mean ± σ):      7.639 s ±  0.065 s    [User: 7.304 s, System: 0.335 s]
>       Range (min … max):    7.543 s …  7.742 s    10 runs
>
>     Benchmark #2: git rev-list --unsorted-input --objects --quiet --not --all --not $newrev
>       Time (mean ± σ):      4.995 s ±  0.044 s    [User: 4.657 s, System: 0.337 s]
>       Range (min … max):    4.909 s …  5.048 s    10 runs
>
>     Summary
>       'git rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev)' ran
>         1.53 ± 0.02 times faster than 'git rev-list  --objects --quiet --not --all --not $newrev'

Just bikeshedding for a potential re-roll, perhaps --unordered-input, so
that it matches/rhymes with the existing "git cat-file --unordered",
which serves the same conceptual purpose (except this one's input, that
one's output).




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux