Re: [PATCH v4 01/13] commit-graph: add format document

Derrick Stolee <stolee@xxxxxxxxx> · Mon, 2 Apr 2018 09:09:29 -0400

On 3/30/2018 9:25 AM, Jakub Narebski wrote:
Derrick Stolee <stolee@xxxxxxxxx> writes:

+== graph-*.graph files have the following format:
What is this '*' here?

No longer necessary. It used to be a placeholder for a hash value, but 
now the graph is stored in objects/info/commit-graph.

[...]
+  The remaining data in the body is described one chunk at a time, and
+  these chunks may be given in any order. Chunks are required unless
+  otherwise specified.
Does Git need to understand all chunks, or could there be optional
chunks that can be safely ignored (like in PNG format)?  Though this may
be overkill, and could be left for later revision of the format if
deemed necessary.

In v6, the format and design documents are edited to make clear the use 
of optional chunks, specifically for future extension without increasing 
the version number.

+
+CHUNK DATA:
+
+  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
+      The ith entry, F[i], stores the number of OIDs with first
+      byte at most i. Thus F[255] stores the total
+      number of commits (N).
All right, it is small enough that can be required even for a very small
number of commits.

+
+  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
+      The OIDs for all commits in the graph, sorted in ascending order.
+
+  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
Do commits need to be put here in the ascending order of OIDs?

Yes.

If so, this would mean that it is not possible to add information about
new commits by only appending data and maybe overwriting some fields, I
think.  You would need to do full rewrite to insert new commit in
appropriate place.

That is the idea. This file is not updated with every new commit, but 
instead will be updated on some scheduled cleanup events. The 
commit-graph file is designed in a way to be non-critical, and not tied 
to the packfile layout. This allows flexibility for when to do the write.

For example, in GVFS, we will write a new commit-graph when there are 
new daily prefetch packs.

This could also integrate with 'gc' and 'repack' so whenever they are 
triggered the commit-graph is written as well.

Commits that do not exist in the commit-graph file will load from the 
object database as normal (after a failed lookup in the commit-graph file).

+    * The first H bytes are for the OID of the root tree.
+    * The next 8 bytes are for the int-ids of the first two parents
+      of the ith commit. Stores value 0xffffffff if no parent in that
+      position. If there are more than two parents, the second value
+      has its most-significant bit on and the other bits store an array
+      position into the Large Edge List chunk.
+    * The next 8 bytes store the generation number of the commit and
+      the commit time in seconds since EPOCH. The generation number
+      uses the higher 30 bits of the first 4 bytes, while the commit
+      time uses the 32 bits of the second 4 bytes, along with the lowest
+      2 bits of the lowest byte, storing the 33rd and 34th bit of the
+      commit time.
+
+  Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
+      This list of 4-byte values store the second through nth parents for
+      all octopus merges. The second parent value in the commit data stores
+      an array position within this list along with the most-significant bit
+      on. Starting at that array position, iterate through this list of int-ids
+      for the parents until reaching a value with the most-significant bit on.
+      The other bits correspond to the int-id of the last parent.
All right, that is one chunk that cannot use fixed-length records; this
shouldn't matter much, as we iterate only up to the number of parents
less two.

Less one: the second "parent" column of the commit data chunk is used to 
point into this list, so (P-1) parents are in this chunk for a commit 
with P parents.

A question: what happens to the last list of parents?  Is there a
guardian value of 0xffffffff at last place?

The termination condition is in the position of the last parent, since 
the most-significant bit is on. The other 31 bits contain the int-id of 
the parent.

Thanks,
-Stolee