Re: [PATCH 04/14] packed-graph: add format document

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 25 Jan 2018 14:06:15 -0800

Derrick Stolee <stolee@xxxxxxxxx> writes:

> Add document specifying the binary format for packed graphs. This
> format allows for:
>
> * New versions.
> * New hash functions and hash lengths.
> * Optional extensions.
>
> Basic header information is followed by a binary table of contents
> into "chunks" that include:
>
> * An ordered list of commit object IDs.
> * A 256-entry fanout into that list of OIDs.
> * A list of metadata for the commits.
> * A list of "large edges" to enable octopus merges.
>
> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> ---
>  Documentation/technical/graph-format.txt | 88 ++++++++++++++++++++++++++++++++
>  1 file changed, 88 insertions(+)
>  create mode 100644 Documentation/technical/graph-format.txt
>
> diff --git a/Documentation/technical/graph-format.txt b/Documentation/technical/graph-format.txt
> new file mode 100644
> index 0000000000..a15e1036d7
> --- /dev/null
> +++ b/Documentation/technical/graph-format.txt
> @@ -0,0 +1,88 @@
> +Git commit graph format
> +=======================

Good that this is not saying "graph format" but is explicit that it
is about "commit".  Do the same for the previous steps.  Especially,
builtin/graph.c that does not have much to do with graph.c is not a
good way forward ;-)

I do like the fact that later parents of octopus merges are moved
out of way to make the majority of records fixed length, but I am
not sure if the "up to two parents are recorded in line" is truly
the best arrangement.  Aren't majority of commits single-parent,
thereby wasting 4 bytes almost always?

Will 32-bit stay to be enough for everybody?  Wouldn't it make sense
to at least define them to be indices into arrays (i.e. scaled to
element size), not "offsets", to recover a few lost bits?

What's the point of storing object id length?  If you do not
understand the object ID scheme, knowing only the length would not
do you much good anyway, no?  And if you know the hashing scheme
specified by Object ID version, you already know the length, no?