On Fri, May 4, 2012 at 12:25 AM, Thomas Gummerer <t.gummerer@xxxxxxxxx> wrote: > GIT index format > ================ > > = The git index file has the following format > > All binary numbers are in network byte order. Version 5 is described > here. > ... > - A number of directory offsets (see below). [1] > > - A number of sorted directories (see below). [2] > > - 32-bit crc32 checksum for the header, extension offsets and directories. So we use one checksum for all dirs? I thought we could do checksum per dir, so if I'm interested in path/to/here only, I only need to verify data of three directories. > == Directory entry offsets > > 32-bit offset to the directory. > > This part is needed for making the directory entries bisectable and > thus allowing a binary search. How is this (I assume) array ordered? The same top-down depth-first with "Directory entry" section below? I can see ordering as top-down/breadth-first help bsearch though. > == Directory entry > > Directory entries are sorted in lexicographic order by the name > of their path starting with the root. > > Path names (variable length) relative to top level directory (without the > leading slash). '/' is used as path separator. '.' indicates the root > directory. The special patch components ".." and ".git" (without quotes) > are disallowed. Trailing slash is also disallowed. > > 1 nul byte to terminate the path. I don't see it mention prefix compression here, nor in "file entry" section. Does it use it here? If so I don't think prefix compression plays well with bsearch (on path name). In the worst case you may have to process up to the first entry in order to get a path name (e.g. a directory with entries "a", "aa", "aaa", "aaaa"...) > The entries are written out in the top-down, depth-first order. The > first entry represents the root level of the repository, followed by > the first subtree - let's call it A - of the root level, followed by > the first subtree of A, ... So depth-first traversal becomes natural even without the help of directory offset table above. Nice. > == File entry > > File entries are sorted in ascending order on the name field, after the > respective offset given by the directory entries. I wonder if we need to keep file entry table separate from directory entry. It feels more natural to put the sequence of file entries of a directory right after the directory entry, might help read-ahead too during traversal. You save 4 bytes (for file entry offset) in each directory entry. You still have file offset table for random access. > File name (variable length). Nul bytes are not allowed in file names and > they have no leading slash. They are 7-bit ASCII encoded. Why can't it be 8-bit? I suppose file name is also prefix compressed? -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html