Re: [PATCH v3 01/14] commit-graph: document commit-graph chains

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/5/2019 1:22 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
> 
>> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
>>
>> Add a basic description of commit-graph chains. More details about the
>> feature will be added as we add functionality. This introduction gives a
>> high-level overview to the goals of the feature and the basic layout of
>> commit-graph chains.
>>
>> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
>> ---
>>  Documentation/technical/commit-graph.txt | 59 ++++++++++++++++++++++++
>>  1 file changed, 59 insertions(+)
>>
>> diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
>> index fb53341d5e..1dca3bd8fe 100644
>> --- a/Documentation/technical/commit-graph.txt
>> +++ b/Documentation/technical/commit-graph.txt
>> @@ -127,6 +127,65 @@ Design Details
>>    helpful for these clones, anyway. The commit-graph will not be read or
>>    written when shallow commits are present.
>>  
>> +Commit Graphs Chains
>> +--------------------
>> +
>> +Typically, repos grow with near-constant velocity (commits per day). Over time,
>> +the number of commits added by a fetch operation is much smaller than the
>> +number of commits in the full history. By creating a "chain" of commit-graphs,
>> +we enable fast writes of new commit data without rewriting the entire commit
>> +history -- at least, most of the time.
>> +
>> +## File Layout
>> +
>> +A commit-graph chain uses multiple files, and we use a fixed naming convention
>> +to organize these files. Each commit-graph file has a name
>> +`$OBJDIR/info/commit-graphs/graph-{hash}.graph` where `{hash}` is the hex-
>> +valued hash stored in the footer of that file (which is a hash of the file's
>> +contents before that hash). For a chain of commit-graph files, a plain-text
>> +file at `$OBJDIR/info/commit-graphs/commit-graph-chain` contains the
>> +hashes for the files in order from "lowest" to "highest".
>> +
>> +For example, if the `commit-graph-chain` file contains the lines
>> +
>> +```
>> +	{hash0}
>> +	{hash1}
>> +	{hash2}
>> +```
>> +
>> +then the commit-graph chain looks like the following diagram:
>> +
>> + +-----------------------+
>> + |  graph-{hash2}.graph  |
>> + +-----------------------+
>> +	  |
>> + +-----------------------+
>> + |                       |
>> + |  graph-{hash1}.graph  |
>> + |                       |
>> + +-----------------------+
>> +	  |
>> + +-----------------------+
>> + |                       |
>> + |                       |
>> + |                       |
>> + |  graph-{hash0}.graph  |
>> + |                       |
>> + |                       |
>> + |                       |
>> + +-----------------------+
>> +
>> +Let X0 be the number of commits in `graph-{hash0}.graph`, X1 be the number of
>> +commits in `graph-{hash1}.graph`, and X2 be the number of commits in
>> +`graph-{hash2}.graph`. If a commit appears in position i in `graph-{hash2}.graph`,
>> +then we interpret this as being the commit in position (X0 + X1 + i), and that
>> +will be used as its "graph position". The commits in `graph-{hash2}.graph` use these
>> +positions to refer to their parents, which may be in `graph-{hash1}.graph` or
>> +`graph-{hash0}.graph`. We can navigate to an arbitrary commit in position j by checking
>> +its containment in the intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 +
>> +X2).
> 
> One thing that I fail to read from the above is what it means for
> graphs to be inside a single chain.  What is the significance for a
> graph file graph-{hash1}.graph to be between graph-{hash0}.graph and
> graph-{hash2}.graph?   For example, is any of the following true?
> 
>  - For a commit in graph-{hash1}.graph file, if graph->{hash0}.graph
>    or any other graph files lower in the position in the chain were
>    unavailable, information on some ancestor of that commit may not
>    be available.

Not only that, but if graph-{hash0}.graph is unavailable, then the
graph-{hash1}.graph file is _unusable_. For example, we don't track
the number of commits in the base graph, so we could not accurately
find the commit-parent links even within graph-{hash1}.graph.

>  - Even if graph-{hash2}.graph or any other graph files higher in
>    the position in the chain gets lost, information on a commit in
>    graph-{hash1}.graph file or any of its ancestors is not affected.

This is correct. As we "build from the bottom" we can stop if we fail
to find a file, and all the data we already accessed is still valid.

> Another thing I've assumed to be true but cannot read from the above
> description is that the hashes in `commit-graph-chain` file, other
> than the newest one, are merely redundant information, and each
> graph file records the hash of its "previous" graph file (i.e. the
> one that used to be the youngest before it got created).

If the entire chain is available, then we only really need the tip hash.
However, having the entire chain in this file allows the "building from
the bottom" pattern so we can get some value even if the tip was removed
from under us. Since we expect the base file to change infrequently, this
should cover a large number of commits.

I can try to make this pattern more clear in a future revision, assuming
we stick with the pattern. It remains unclear if this strategy as a whole
has been accepted as a good direction.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux