Re: [PATCH v2 09/11] commit-graph: merge commit-graph chains

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Thu, 23 May 2019 02:43:06 +0200

On Wed, May 22 2019, Derrick Stolee via GitGitGadget wrote:

> To keep lookups fast, but also keep most incremental writes fast, create
> a strategy for merging levels of the commit-graph chain. The strategy is
> detailed in the commit-graph design document, but is summarized by these
> two conditions:
>
>   1. If the number of commits we are adding is more than half the number
>      of commits in the graph below, then merge with that graph.
>
>   2. If we are writing more than 64,000 commits into a single graph,
>      then merge with all lower graphs.
>
> The numeric values in the conditions above are currently constant, but
> can become config options in a future update.
> [...]
> +## Merge Strategy
> +
> +When writing a set of commits that do not exist in the commit-graph stack of
> +height N, we default to creating a new file at level N + 1. We then decide to
> +merge with the Nth level if one of two conditions hold:
> +
> +  1. The expected file size for level N + 1 is at least half the file size for
> +     level N.
> +
> +  2. Level N + 1 contains more than MAX_SPLIT_COMMITS commits (64,0000
> +     commits).
> +
> +This decision cascades down the levels: when we merge a level we create a new
> +set of commits that then compares to the next level.
> +
> +The first condition bounds the number of levels to be logarithmic in the total
> +number of commits.  The second condition bounds the total number of commits in
> +a `graph-{hashN}` file and not in the `commit-graph` file, preventing
> +significant performance issues when the stack merges and another process only
> +partially reads the previous stack.
> +
> +The merge strategy values (2 for the size multiple, 64,000 for the maximum
> +number of commits) could be extracted into config settings for full
> +flexibility.

As noted this can become configurable, so it's no big deal. But is there
any reason for ths 64K limit anymore?

While with the default expiry of 0sec we can still get that race, it
seems unlikely in practice, as the "commit-graph write" process would
write a new manifest at the end, then go and unlink() the old files.

So maybe at this point we could make this even dumber with something
that behaves like gc.autoPackLimit? I.e. keep writing new graphs, and
then coalesce them all (or maybe not the "base" graph, like
gc.bigPackThreshold)?

Also: These docs refer to MAX_SPLIT_COMMITS, but in v2 it's now a
"split_strategy_max_commits" variable instead.