Re: [PATCH v3 1/4] connected: do not sort input revisions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Patrick Steinhardt <ps@xxxxxx> writes:

> In order to compute whether objects reachable from a set of tips are all
> connected, we do a revision walk with these tips as positive references
> and `--not --all`. `--not --all` will cause the revision walk to load
> all preexisting references as uninteresting, which can be very expensive
> in repositories with many references.
>
> Benchmarking the git-rev-list(1) command highlights that by far the most
> expensive single phase is initial sorting of the input revisions: after
> all references have been loaded, we first sort commits by author date.
> In a real-world repository with about 2.2 million references, it makes
> up about 40% of the total runtime of git-rev-list(1).

Nice finding.

> Ultimately, the connectivity check shouldn't really bother about the
> order of input revisions at all. We only care whether we can actually
> walk all objects until we hit the cut-off point. So sorting the input is
> a complete waste of time.

Sorting of positive side is done to help both performance and
correctness in regular use of the traversal machinery, especially
when reachability bitmap is not in effect, but on the negative side
I do not think there is any downside to omit sorting offhand.  The
only case that may get affected is when the revision.c::SLOP kicks
in to deal with oddball commits with incorrect committer timestamps,
but then the result of the sorting isn't to be trusted anyway, so...

> Introduce a new "--unsorted-input" flag to git-rev-list(1) which will
> cause it to not sort the commits and adjust the connectivity check to
> always pass the flag. This results in the following speedups, executed
> in a clone of gitlab-org/gitlab [1]:
> ...
> [1]: https://gitlab.com/gitlab-org/gitlab.git. Note that not all refs
>      are visible to clients.

So is this the 2.2 million refs thing?

> @@ -3584,7 +3586,7 @@ int prepare_revision_walk(struct rev_info *revs)
>  
>  	if (!revs->reflog_info)
>  		prepare_to_use_bloom_filter(revs);
> -	if (revs->no_walk != REVISION_WALK_NO_WALK_UNSORTED)
> +	if (revs->no_walk != REVISION_WALK_NO_WALK_UNSORTED && !revs->unsorted_input)
>  		commit_list_sort_by_date(&revs->commits);

Looks quite straight-forward.

I however suspect that in the longer term it may be cleaner to get
rid of REVSISION_WALK_NO_WALK_UNSORTED if we do this.  The knob that
controls if we sort the initial traversal tips and the knob that
controls if we walk from these tips used to be tied to --no-walk
only because ca92e59e30b wanted to affect only no-walk case, but
with your new finding, it clearly is not limited to the no-walk case
to want to avoid sorting.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux