Re: [PATCH v4 01/13] commit-graph: add format document

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Derrick Stolee <stolee@xxxxxxxxx> writes:

> On 3/30/2018 9:25 AM, Jakub Narebski wrote:
>> Derrick Stolee <stolee@xxxxxxxxx> writes:
>>
>>> +== graph-*.graph files have the following format:
>> What is this '*' here?
>
> No longer necessary. It used to be a placeholder for a hash value, but
> now the graph is stored in objects/info/commit-graph.

All right.

Excuse me replying to v4 instead of v6 of the patch series, where it
would be answered or rather made moot already.

>>
>> [...]
>>> +  The remaining data in the body is described one chunk at a time, and
>>> +  these chunks may be given in any order. Chunks are required unless
>>> +  otherwise specified.
>> Does Git need to understand all chunks, or could there be optional
>> chunks that can be safely ignored (like in PNG format)?  Though this may
>> be overkill, and could be left for later revision of the format if
>> deemed necessary.
>
> In v6, the format and design documents are edited to make clear the
> use of optional chunks, specifically for future extension without
> increasing the version number.

That's good.

>>> +CHUNK DATA:
>>> +
>>> +  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
>>> +      The ith entry, F[i], stores the number of OIDs with first
>>> +      byte at most i. Thus F[255] stores the total
>>> +      number of commits (N).
>> All right, it is small enough that can be required even for a very small
>> number of commits.
>>
>>> +
>>> +  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
>>> +      The OIDs for all commits in the graph, sorted in ascending order.
>>> +
>>> +  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
>> Do commits need to be put here in the ascending order of OIDs?
>
> Yes.
>
>> If so, this would mean that it is not possible to add information about
>> new commits by only appending data and maybe overwriting some fields, I
>> think.  You would need to do full rewrite to insert new commit in
>> appropriate place.
>
> That is the idea. This file is not updated with every new commit, but
> instead will be updated on some scheduled cleanup events. The
> commit-graph file is designed in a way to be non-critical, and not
> tied to the packfile layout. This allows flexibility for when to do
> the write.
>
> For example, in GVFS, we will write a new commit-graph when there are
> new daily prefetch packs.
>
> This could also integrate with 'gc' and 'repack' so whenever they are
> triggered the commit-graph is written as well.

I wonder if it would be possible to use existing hooks...

> Commits that do not exist in the commit-graph file will load from the
> object database as normal (after a failed lookup in the commit-graph
> file).

Ah. I thought wrongly that it would (or at least could) be something
that can be kept up to date, and extended when adding any new commit.

>>> +    * The first H bytes are for the OID of the root tree.
>>> +    * The next 8 bytes are for the int-ids of the first two parents
>>> +      of the ith commit. Stores value 0xffffffff if no parent in that
>>> +      position. If there are more than two parents, the second value
>>> +      has its most-significant bit on and the other bits store an array
>>> +      position into the Large Edge List chunk.
>>> +    * The next 8 bytes store the generation number of the commit and
>>> +      the commit time in seconds since EPOCH. The generation number
>>> +      uses the higher 30 bits of the first 4 bytes, while the commit
>>> +      time uses the 32 bits of the second 4 bytes, along with the lowest
>>> +      2 bits of the lowest byte, storing the 33rd and 34th bit of the
>>> +      commit time.
>>> +
>>> +  Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
>>> +      This list of 4-byte values store the second through nth parents for
>>> +      all octopus merges. The second parent value in the commit data stores
>>> +      an array position within this list along with the most-significant bit
>>> +      on. Starting at that array position, iterate through this list of int-ids
>>> +      for the parents until reaching a value with the most-significant bit on.
>>> +      The other bits correspond to the int-id of the last parent.
>>
>> All right, that is one chunk that cannot use fixed-length records; this
>> shouldn't matter much, as we iterate only up to the number of parents
>> less two.
>
> Less one: the second "parent" column of the commit data chunk is used
> to point into this list, so (P-1) parents are in this chunk for a
> commit with P parents.

Right.

>> A question: what happens to the last list of parents?  Is there a
>> guardian value of 0xffffffff at last place?
>
> The termination condition is in the position of the last parent, since
> the most-significant bit is on. The other 31 bits contain the int-id
> of the parent.

Ah. I have misunderstood the format: I thought that first entry is
marked with most-significant bit set to 1, and all the rest to 0, while
it is last entry (last parent) has most-significant bit set, while all
others (if any) do not. So there is no need for guardian value.

Best regards,
-- 
Jakub Narębski




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux