Re: [PATCH v3 0/6] Create commit-graph file format v2

Derrick Stolee <stolee@xxxxxxxxx> · Thu, 2 May 2019 09:26:04 -0400

On 5/1/2019 4:25 PM, Ævar Arnfjörð Bjarmason wrote:
> I won't repeat my outstanding v2 feedback about v1 & v2
> incompatibilities, except to say that I'd in principle be fine with
> having a v2 format the way this series is adding it. I.e. saying "here
> it is, it's never written by default, we'll figure out these compat
> issues later".
> 
> My only objection/nit on that point would be that the current
> docs/commit messages should make some mention of the really bad
> interactions between v1 and v2 on different git versions.

Good idea to add some warnings in the docs to say something like
"version 2 is not supported by Git 2.2x and earlier".

> However, having written this again I really don't understand why we need
> a v2 of this format at all.

[snip]

> How about we instead just don't change the header? I.e.:
> 
>  * Let's just live with "1" as the marker for SHA-1.
> 
>    Yeah it would be cute to use 0x73686131 instead like "struct
>    git_hash_algo", but we can live with a 1=0x73686131 ("sha1"),
>    2=0x73323536 ("s256") mapping somewhere. It's not like we're going to
>    be running into the 255 limit of hash algorithms Git will support any
>    time soon.

This was the intended direction of using the 1-byte value before, but
we have a preferred plan to use the 4-byte value in all future file formats.

>  * Don't add the reachability index version *to the header* or change
>    the reserved byte to be an error (see [1] again).

Since we can make the "corrected commit date" offset for a commit be
strictly larger than the offset of a parent, we can make it so an old client
will not give incorrect values when we use the new values. The only downside
would be that we would fail on 'git commit-graph verify' since the offsets
are not actually generation numbers in all cases.

> Instead we just add these things to new "chunks" as appropriate. As this
> patch of mine shows we can easily do that, and it doesn't error out on
> any existing version of git:
> https://github.com/avar/git/commit/3fca63e12a9d38867d4bc0a8a25d419c00a09d95

I like the idea of a "metadata" chunk. This can be useful for a lot of things.
If we start the chunk with a "number of items" and only append items to the
list, we can dynamically grow the chunk as we add values.

> I now can't imagine a situation where we'd ever need to change the
> format. We have 32 bits of chunk ids to play with, and can have 255 of
> these chunks at a time, and unknown chunks are ignored by existing
> versions and future version.

The solutions you have discussed work for 2 of the 3 problems at hand.
The incremental file format is a high-value feature, but _would_ break
existing clients if they don't understand the extra data. Unless I am
missing something for how to succeed here.

> 1. See feedback on the v2 patch in
>    https://public-inbox.org/git/87muk6q98k.fsf@xxxxxxxxxxxxxxxxxxx/

My response [2] to that message includes the discussion of the
incremental file format.

[2] https://public-inbox.org/git/87muk6q98k.fsf@xxxxxxxxxxxxxxxxxxx/