Derrick Stolee <stolee@xxxxxxxxx> writes: > Add document specifying the binary format for packed graphs. This > format allows for: > > * New versions. > * New hash functions and hash lengths. > * Optional extensions. > > Basic header information is followed by a binary table of contents > into "chunks" that include: > > * An ordered list of commit object IDs. > * A 256-entry fanout into that list of OIDs. > * A list of metadata for the commits. > * A list of "large edges" to enable octopus merges. > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > --- > Documentation/technical/graph-format.txt | 88 ++++++++++++++++++++++++++++++++ > 1 file changed, 88 insertions(+) > create mode 100644 Documentation/technical/graph-format.txt > > diff --git a/Documentation/technical/graph-format.txt b/Documentation/technical/graph-format.txt > new file mode 100644 > index 0000000000..a15e1036d7 > --- /dev/null > +++ b/Documentation/technical/graph-format.txt > @@ -0,0 +1,88 @@ > +Git commit graph format > +======================= Good that this is not saying "graph format" but is explicit that it is about "commit". Do the same for the previous steps. Especially, builtin/graph.c that does not have much to do with graph.c is not a good way forward ;-) I do like the fact that later parents of octopus merges are moved out of way to make the majority of records fixed length, but I am not sure if the "up to two parents are recorded in line" is truly the best arrangement. Aren't majority of commits single-parent, thereby wasting 4 bytes almost always? Will 32-bit stay to be enough for everybody? Wouldn't it make sense to at least define them to be indices into arrays (i.e. scaled to element size), not "offsets", to recover a few lost bits? What's the point of storing object id length? If you do not understand the object ID scheme, knowing only the length would not do you much good anyway, no? And if you know the hashing scheme specified by Object ID version, you already know the length, no?