Re: [PATCH v3 0/6] Create commit-graph file format v2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 02 2019, Derrick Stolee wrote:

> On 5/1/2019 4:25 PM, Ævar Arnfjörð Bjarmason wrote:
>> I won't repeat my outstanding v2 feedback about v1 & v2
>> incompatibilities, except to say that I'd in principle be fine with
>> having a v2 format the way this series is adding it. I.e. saying "here
>> it is, it's never written by default, we'll figure out these compat
>> issues later".
>>
>> My only objection/nit on that point would be that the current
>> docs/commit messages should make some mention of the really bad
>> interactions between v1 and v2 on different git versions.
>
> Good idea to add some warnings in the docs to say something like
> "version 2 is not supported by Git 2.2x and earlier".
>
>> However, having written this again I really don't understand why we need
>> a v2 of this format at all.
>
> [snip]
>
>> How about we instead just don't change the header? I.e.:
>>
>>  * Let's just live with "1" as the marker for SHA-1.
>>
>>    Yeah it would be cute to use 0x73686131 instead like "struct
>>    git_hash_algo", but we can live with a 1=0x73686131 ("sha1"),
>>    2=0x73323536 ("s256") mapping somewhere. It's not like we're going to
>>    be running into the 255 limit of hash algorithms Git will support any
>>    time soon.
>
> This was the intended direction of using the 1-byte value before, but
> we have a preferred plan to use the 4-byte value in all future file formats.

Right, and I wouldn't argue about such a pointless thing for a future
file format.

But since the v1->v2 migration story is so unfriendly already for
reasons that can't be helped at this point (existing released versions)
I think we need to weigh the trade-offs of changing the header v.s. just
doing the conceptually less clean thing that allows existing clients a
painless transition.

>>  * Don't add the reachability index version *to the header* or change
>>    the reserved byte to be an error (see [1] again).
>
> Since we can make the "corrected commit date" offset for a commit be
> strictly larger than the offset of a parent, we can make it so an old client
> will not give incorrect values when we use the new values. The only downside
> would be that we would fail on 'git commit-graph verify' since the offsets
> are not actually generation numbers in all cases.

Aren't you talking about how the *content* (presumably in the chunk part
of the graph) is going to look like? I just mean these couple of bytes
in the header, again as a proxy discussion for "do we *really* need to
change this?".

>> Instead we just add these things to new "chunks" as appropriate. As this
>> patch of mine shows we can easily do that, and it doesn't error out on
>> any existing version of git:
>> https://github.com/avar/git/commit/3fca63e12a9d38867d4bc0a8a25d419c00a09d95
>
> I like the idea of a "metadata" chunk. This can be useful for a lot of things.
> If we start the chunk with a "number of items" and only append items to the
> list, we can dynamically grow the chunk as we add values.

Right. I like it too :) But right now I'm just using it as a demo for
how new arbitrary chunk data can be added to the v1 format in backwards
compatible ways.

My inclination for an actual version of that patch would be to make it
easier to read/extend (even just dump JSON there, or a series of
key/values) over micro-optimizing the storage size. Such metadata will
always be tiny v.s. the rest, but that's for later bikeshedding...

>> I now can't imagine a situation where we'd ever need to change the
>> format. We have 32 bits of chunk ids to play with, and can have 255 of
>> these chunks at a time, and unknown chunks are ignored by existing
>> versions and future version.
>
> The solutions you have discussed work for 2 of the 3 problems at hand.
> The incremental file format is a high-value feature, but _would_ break
> existing clients if they don't understand the extra data. Unless I am
> missing something for how to succeed here.

We would write out a file like this:

    <CGPH signature>
    <rest of v1 header incl. chunk offsets (but higher chunk count)>
    <chunks git understands now>
    <new chunks>
    <signature>

Where one of the new chunks could be INCC ("incremental count") or
whatever, serving the same purpose as the v2 modification of the header
to use the padding byte for the count. Then we'd have more chunks with
the incremental data (or pack it into one "magic" chunk with its own
format...).

Existing clients deal with the graph being incomplete, so the writer
could just not bother to update that part of the data and a newer
clients would know to find the rest in a series of incremental updates.

IOW an "empty" commit-graph now is 1100 bytes. Worst case we'd be
writing at least that number of bytes that would be mostly or entirely
useless to older clients, with the rest being new stuff newer clients
understand.

On the incremental format: I don't like:

 1) The idea that an incremental format would involve in-place
    modification of an existing file (or would we write a completely new
    one and move it in-place?).

    If it's in-place modification we get away a lot of things/avoid
    complexity with "we might delete, but we never modify existing" on
    existing *.{idx,bitmap} formats. E.g. mmap() is a royal PITA
    (more-so than now) once you need to deal with modifications.

    Also, if it's in-place we'd need to fully recompute the checksum
    SHA-1 as we modify the file.

 2) The assumption that we'd just have 255 of these, wouldn't it be
    reasonable to have a MIDX-like for it & write it along with packs as
    they come in? I.e. eventually have PACK.{pack,idx,bitmap,graph}.

    We support more than 255 packs, and it seems likely that we'd
    eventually want to generate/coalesce packs/idx and any other
    side-indexes on the same "gc" schedule.

But those are separate from any back-compat concerns, which is what I
think makes sense to focus on now.

>> 1. See feedback on the v2 patch in
>>    https://public-inbox.org/git/87muk6q98k.fsf@xxxxxxxxxxxxxxxxxxx/
>
> My response [2] to that message includes the discussion of the
> incremental file format.
>
> [2] https://public-inbox.org/git/87muk6q98k.fsf@xxxxxxxxxxxxxxxxxxx/




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux