Thomas Gummerer <t.gummerer@xxxxxxxxx> writes: > +GIT index format > +================ > + > +== The git index file format > + > + The git index file (.git/index) documents the status of the files > + in the git staging area. > + > + The staging area is used for preparing commits, merging, etc. The above two are not about "index file format". It is an explanation of what the index is. > + All binary numbers are in network byte order. Version 5 is described > + here. I had to read between these two lines something like ""The index file consists of various sections; the sections appear in the following order in the file.""" to make sense of the document. > + - A 20-byte header consisting of > + > + sig (32-bits): Signature: > + The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache") > + > + vnr (32-bits): Version number: > + The current supported versions are 2, 3, 4 and 5. > + > + ndir (32-bits): number of directories in the index. > + > + nfile (32-bits): number of file entries in the index. > + > + fblockoffset (32-bits): offset to the file block, relative to the > + beginning of the file. Ok. > + - Offset to the extensions. > > + nextensions (32-bits): number of extensions. > + > + extoffset (32-bits): offset to the extension. (Possibly none, as > + many as indicated in the 4-byte number of extensions) OK. > + headercrc (32-bits): crc checksum for the header and extension > + offsets This may have to have the same " - <section title>" at the same level as "A 20-byte header" and "Offset to the ext"; as it stands, it looks as if it is part of "Offset to the ext" which consists of 12 bytes. > + - diroffsets (ndir * directory offsets): A directory offset for each > + of the ndir directories in the index, sorted by pathname (of the > + directory it's pointing to) (see below). The diroffsets are relative > + to the beginning of the direntries block. [1] "ndir * diroffsets" confused me. I think you meant to say that this "diroffsets" section consists of ndir entries of something and that each of that something is a directory offset. It is unclear how "a directory offset" is represented, except that it is "relative to the beginning of direntry block" (and it is unclear what and where the direntry block is from the information given up to this point) and the reader can guess it is in "network byte order" (assuming it is a binary number). Perhaps diroffsets (ndir entries of "directory offset"): A 4-byte offset relative to the beginning of the "direntries block" (see below) for each of the ... and drop the last sentence? Other tables may want to be adjusted in a similar fashion. > +== Directory offsets (diroffsets) > + > + diroffset (32-bits): offset to the directory relative to the beginning > + of the index file. There are ndir + 1 offsets in the diroffset table, > + the last is pointing to the end of the last direntry. With this last > + entry, we can replace the strlen when reading each filename, by > + calculating its length with the offsets. The mention of "strlen" looks very out of place. The reader may be able to guess that you want to say that the nth "string" is between diroffset[n] and diroffset[n+1], and these "string"s are densely packed so strlen(diroffset[n]) and diroffset[n+1]-diroffset[n] are either the same thing (or with a fixed difference, if each "string" is accompanied by some fixed-length data), but it is unclear what these "strings" represent, especially because the name of the table implies that you are talking about directories but strlen talks about filename. > +== Design explanations > + ... > +[3] The data of the cache-tree extension and the resolve undo > + extension is now part of the index itself, but if other extensions > + come up in the future, there is no need to change the index, they > + can simply be added at the end. Interesting. When we added extensions, we said that there is no need to change the index to add new features, they can simply be added at the end. Perhaps the file offset table can be added as an extension to v2 to give us the same bisectability, allowing us a single entry in-place replacementability, without defining an entirely different format? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html