Re: [PATCH v3 01/14] commit-graph: document commit-graph chains

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 05 Jun 2019 10:22:16 -0700

"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
>
> Add a basic description of commit-graph chains. More details about the
> feature will be added as we add functionality. This introduction gives a
> high-level overview to the goals of the feature and the basic layout of
> commit-graph chains.
>
> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> ---
>  Documentation/technical/commit-graph.txt | 59 ++++++++++++++++++++++++
>  1 file changed, 59 insertions(+)
>
> diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
> index fb53341d5e..1dca3bd8fe 100644
> --- a/Documentation/technical/commit-graph.txt
> +++ b/Documentation/technical/commit-graph.txt
> @@ -127,6 +127,65 @@ Design Details
>    helpful for these clones, anyway. The commit-graph will not be read or
>    written when shallow commits are present.
>  
> +Commit Graphs Chains
> +--------------------
> +
> +Typically, repos grow with near-constant velocity (commits per day). Over time,
> +the number of commits added by a fetch operation is much smaller than the
> +number of commits in the full history. By creating a "chain" of commit-graphs,
> +we enable fast writes of new commit data without rewriting the entire commit
> +history -- at least, most of the time.
> +
> +## File Layout
> +
> +A commit-graph chain uses multiple files, and we use a fixed naming convention
> +to organize these files. Each commit-graph file has a name
> +`$OBJDIR/info/commit-graphs/graph-{hash}.graph` where `{hash}` is the hex-
> +valued hash stored in the footer of that file (which is a hash of the file's
> +contents before that hash). For a chain of commit-graph files, a plain-text
> +file at `$OBJDIR/info/commit-graphs/commit-graph-chain` contains the
> +hashes for the files in order from "lowest" to "highest".
> +
> +For example, if the `commit-graph-chain` file contains the lines
> +
> +```
> +	{hash0}
> +	{hash1}
> +	{hash2}
> +```
> +
> +then the commit-graph chain looks like the following diagram:
> +
> + +-----------------------+
> + |  graph-{hash2}.graph  |
> + +-----------------------+
> +	  |
> + +-----------------------+
> + |                       |
> + |  graph-{hash1}.graph  |
> + |                       |
> + +-----------------------+
> +	  |
> + +-----------------------+
> + |                       |
> + |                       |
> + |                       |
> + |  graph-{hash0}.graph  |
> + |                       |
> + |                       |
> + |                       |
> + +-----------------------+
> +
> +Let X0 be the number of commits in `graph-{hash0}.graph`, X1 be the number of
> +commits in `graph-{hash1}.graph`, and X2 be the number of commits in
> +`graph-{hash2}.graph`. If a commit appears in position i in `graph-{hash2}.graph`,
> +then we interpret this as being the commit in position (X0 + X1 + i), and that
> +will be used as its "graph position". The commits in `graph-{hash2}.graph` use these
> +positions to refer to their parents, which may be in `graph-{hash1}.graph` or
> +`graph-{hash0}.graph`. We can navigate to an arbitrary commit in position j by checking
> +its containment in the intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 +
> +X2).

One thing that I fail to read from the above is what it means for
graphs to be inside a single chain.  What is the significance for a
graph file graph-{hash1}.graph to be between graph-{hash0}.graph and
graph-{hash2}.graph?   For example, is any of the following true?

 - For a commit in graph-{hash1}.graph file, if graph->{hash0}.graph
   or any other graph files lower in the position in the chain were
   unavailable, information on some ancestor of that commit may not
   be available.

 - Even if graph-{hash2}.graph or any other graph files higher in
   the position in the chain gets lost, information on a commit in
   graph-{hash1}.graph file or any of its ancestors is not affected.

Another thing I've assumed to be true but cannot read from the above
description is that the hashes in `commit-graph-chain` file, other
than the newest one, are merely redundant information, and each
graph file records the hash of its "previous" graph file (i.e. the
one that used to be the youngest before it got created).