Patrick Steinhardt <ps@xxxxxx> writes: > In order to compute whether objects reachable from a set of tips are all > connected, we do a revision walk with these tips as positive references > and `--not --all`. `--not --all` will cause the revision walk to load > all preexisting references as uninteresting, which can be very expensive > in repositories with many references. > > Benchmarking the git-rev-list(1) command highlights that by far the most > expensive single phase is initial sorting of the input revisions: after > all references have been loaded, we first sort commits by author date. > In a real-world repository with about 2.2 million references, it makes > up about 40% of the total runtime of git-rev-list(1). Nice finding. > Ultimately, the connectivity check shouldn't really bother about the > order of input revisions at all. We only care whether we can actually > walk all objects until we hit the cut-off point. So sorting the input is > a complete waste of time. Sorting of positive side is done to help both performance and correctness in regular use of the traversal machinery, especially when reachability bitmap is not in effect, but on the negative side I do not think there is any downside to omit sorting offhand. The only case that may get affected is when the revision.c::SLOP kicks in to deal with oddball commits with incorrect committer timestamps, but then the result of the sorting isn't to be trusted anyway, so... > Introduce a new "--unsorted-input" flag to git-rev-list(1) which will > cause it to not sort the commits and adjust the connectivity check to > always pass the flag. This results in the following speedups, executed > in a clone of gitlab-org/gitlab [1]: > ... > [1]: https://gitlab.com/gitlab-org/gitlab.git. Note that not all refs > are visible to clients. So is this the 2.2 million refs thing? > @@ -3584,7 +3586,7 @@ int prepare_revision_walk(struct rev_info *revs) > > if (!revs->reflog_info) > prepare_to_use_bloom_filter(revs); > - if (revs->no_walk != REVISION_WALK_NO_WALK_UNSORTED) > + if (revs->no_walk != REVISION_WALK_NO_WALK_UNSORTED && !revs->unsorted_input) > commit_list_sort_by_date(&revs->commits); Looks quite straight-forward. I however suspect that in the longer term it may be cleaner to get rid of REVSISION_WALK_NO_WALK_UNSORTED if we do this. The knob that controls if we sort the initial traversal tips and the knob that controls if we walk from these tips used to be tied to --no-walk only because ca92e59e30b wanted to affect only no-walk case, but with your new finding, it clearly is not limited to the no-walk case to want to avoid sorting.