Derrick Stolee <stolee@xxxxxxxxx> writes: > On 3/30/2018 9:25 AM, Jakub Narebski wrote: >> Derrick Stolee <stolee@xxxxxxxxx> writes: >> >>> +== graph-*.graph files have the following format: >> What is this '*' here? > > No longer necessary. It used to be a placeholder for a hash value, but > now the graph is stored in objects/info/commit-graph. All right. Excuse me replying to v4 instead of v6 of the patch series, where it would be answered or rather made moot already. >> >> [...] >>> + The remaining data in the body is described one chunk at a time, and >>> + these chunks may be given in any order. Chunks are required unless >>> + otherwise specified. >> Does Git need to understand all chunks, or could there be optional >> chunks that can be safely ignored (like in PNG format)? Though this may >> be overkill, and could be left for later revision of the format if >> deemed necessary. > > In v6, the format and design documents are edited to make clear the > use of optional chunks, specifically for future extension without > increasing the version number. That's good. >>> +CHUNK DATA: >>> + >>> + OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes) >>> + The ith entry, F[i], stores the number of OIDs with first >>> + byte at most i. Thus F[255] stores the total >>> + number of commits (N). >> All right, it is small enough that can be required even for a very small >> number of commits. >> >>> + >>> + OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes) >>> + The OIDs for all commits in the graph, sorted in ascending order. >>> + >>> + Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes) >> Do commits need to be put here in the ascending order of OIDs? > > Yes. > >> If so, this would mean that it is not possible to add information about >> new commits by only appending data and maybe overwriting some fields, I >> think. You would need to do full rewrite to insert new commit in >> appropriate place. > > That is the idea. This file is not updated with every new commit, but > instead will be updated on some scheduled cleanup events. The > commit-graph file is designed in a way to be non-critical, and not > tied to the packfile layout. This allows flexibility for when to do > the write. > > For example, in GVFS, we will write a new commit-graph when there are > new daily prefetch packs. > > This could also integrate with 'gc' and 'repack' so whenever they are > triggered the commit-graph is written as well. I wonder if it would be possible to use existing hooks... > Commits that do not exist in the commit-graph file will load from the > object database as normal (after a failed lookup in the commit-graph > file). Ah. I thought wrongly that it would (or at least could) be something that can be kept up to date, and extended when adding any new commit. >>> + * The first H bytes are for the OID of the root tree. >>> + * The next 8 bytes are for the int-ids of the first two parents >>> + of the ith commit. Stores value 0xffffffff if no parent in that >>> + position. If there are more than two parents, the second value >>> + has its most-significant bit on and the other bits store an array >>> + position into the Large Edge List chunk. >>> + * The next 8 bytes store the generation number of the commit and >>> + the commit time in seconds since EPOCH. The generation number >>> + uses the higher 30 bits of the first 4 bytes, while the commit >>> + time uses the 32 bits of the second 4 bytes, along with the lowest >>> + 2 bits of the lowest byte, storing the 33rd and 34th bit of the >>> + commit time. >>> + >>> + Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional] >>> + This list of 4-byte values store the second through nth parents for >>> + all octopus merges. The second parent value in the commit data stores >>> + an array position within this list along with the most-significant bit >>> + on. Starting at that array position, iterate through this list of int-ids >>> + for the parents until reaching a value with the most-significant bit on. >>> + The other bits correspond to the int-id of the last parent. >> >> All right, that is one chunk that cannot use fixed-length records; this >> shouldn't matter much, as we iterate only up to the number of parents >> less two. > > Less one: the second "parent" column of the commit data chunk is used > to point into this list, so (P-1) parents are in this chunk for a > commit with P parents. Right. >> A question: what happens to the last list of parents? Is there a >> guardian value of 0xffffffff at last place? > > The termination condition is in the position of the last parent, since > the most-significant bit is on. The other 31 bits contain the int-id > of the parent. Ah. I have misunderstood the format: I thought that first entry is marked with most-significant bit set to 1, and all the rest to 0, while it is last entry (last parent) has most-significant bit set, while all others (if any) do not. So there is no need for guardian value. Best regards, -- Jakub Narębski