Re: Speed of git branch --contains

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 24 Jan 2018 17:20:13 +0100

On Tue, Jan 23 2018, Andreas Krey jotted:

> I'm just looking at some scripts that do a 'git branch --contains $id --remote'
> for each new commit in a repo, and unfortunately each invokation already
> takes four minutes.
>
> It feels like git branch does the reachability detection separately
> for each branch potentially listed. The alternative would be to
>
> - invert the parent map to a child map,
> - use that to compute the set of commits that contain $id,
> - then use that as predicate whether to show a given branch
>   (show iff its head is in the set)
>
> That would speed things up considerably,
> but what are the chances to see that change in git?
>
> I can do that as well within the script, with the additional
> benefit that I only need to do the inversion once, but I might
> instead take a stab at git branch.

I posted something similar to the list the other day, and Derrick had a
great follow-up to that which summarized the current work on this:
https://public-inbox.org/git/87608bawoa.fsf@xxxxxxxxxxxxxxxxxxx/

Junio mentioned an edge case in that thread which you may not have
thought of (I didn't). I.e. that one problem with such a mapping is that
a new branch may at any point push new history which includes your
commit as a merge, forcing you to re-compute this child map.

That can be optimized by checking whether some commits come after others
timestamp wise, but that brings us to the problem that timestamps aren't
guaranteed to be monotonically increasing (and may even be years off) by
git, which is another optimization challenge for things like --contains.