Derrick Stolee <stolee@xxxxxxxxx> writes: > Documentation/technical/commit-graph-format.txt | 90 +++++++++++++++++++++++++ > 1 file changed, 90 insertions(+) > create mode 100644 Documentation/technical/commit-graph-format.txt Hopefully just a few remaining nits. Overall I find this written really clearly. > +== graph-*.graph files have the following format: > + > +In order to allow extensions that add extra data to the graph, we organize > +the body into "chunks" and provide a binary lookup table at the beginning > +of the body. The header includes certain values, such as number of chunks, > +hash lengths and types. We no longer have lengths stored. > + ... > + The remaining data in the body is described one chunk at a time, and > + these chunks may be given in any order. Chunks are required unless > + otherwise specified. It is good that this explicitly says chunks can come in any order, and which ones are required. It should also say which chunk can appear multiple times. I think all four chunk types we currently define can have at most one instance in a file. > +CHUNK DATA: > + > + OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes) > + The ith entry, F[i], stores the number of OIDs with first > + byte at most i. Thus F[255] stores the total > + number of commits (N). > + > + OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes) > + The OIDs for all commits in the graph, sorted in ascending order. Somewhere in this document, we probably would want to say that this format allows at most (1<<31)-1 commits recorded in the file (as CGET and EDGE uses 31-bit uint to index into this table, using MSB for other purposes, and the all-1-bit pattern is also special), and when we refer to "int-ids" of a commit, it is this 31-bit number that is an index into this table. > + Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes) > + * The first H bytes are for the OID of the root tree. > + * The next 8 bytes are for the int-ids of the first two parents > + of the ith commit. Stores value 0xffffffff if no parent in that > + position. If there are more than two parents, the second value > + has its most-significant bit on and the other bits store an array > + position into the Large Edge List chunk. > + * The next 8 bytes store the generation number of the commit and > + the commit time in seconds since EPOCH. The generation number > + uses the higher 30 bits of the first 4 bytes, while the commit > + time uses the 32 bits of the second 4 bytes, along with the lowest > + 2 bits of the lowest byte, storing the 33rd and 34th bit of the > + commit time. > + > + Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional] > + This list of 4-byte values store the second through nth parents for > + all octopus merges. The second parent value in the commit data stores > + an array position within this list along with the most-significant bit > + on. Starting at that array position, iterate through this list of int-ids > + for the parents until reaching a value with the most-significant bit on. > + The other bits correspond to the int-id of the last parent. > + > +TRAILER: > + > + H-byte HASH-checksum of all of the above. > +