Re: Index format v5

Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> · Sun, 6 May 2012 17:23:58 +0700

On Fri, May 4, 2012 at 12:25 AM, Thomas Gummerer <t.gummerer@xxxxxxxxx> wrote:
> == Directory entry
>
>  Directory entries are sorted in lexicographic order by the name
>  of their path starting with the root.
>
>  Path names (variable length) relative to top level directory (without the
>    leading slash). '/' is used as path separator. '.' indicates the root
>    directory. The special patch components ".." and ".git" (without quotes)
>    are disallowed. Trailing slash is also disallowed.
>
>  1 nul byte to terminate the path.
>
>  32-bit offset to the first file of a directory
>
>  32-bit offset to conflicted/resolved data at the end of the index.
>    0 if there is no such data. [4]

If it's non-zero, how do we know how many conflict entries we have?

>  4-byte number of subtrees this tree has

let's name this nr_subtrees

>  4-byte number of entries in the index that is covered by the tree this
>    entry represents. (entry_count) (-1 if the entry is invalid)

and this nr_entries.

So how do we know how many entries (including all dirs, files, staged
files) this directory has? I assume if enry_count != -1, the number
would be nr_subtrees + nr_entries (or just nr_entries, depending on
your definition). When entry_count == -1, how do we calculate this
number?

>  160-bit object name for the object that would result from writing
>    this span of index as a tree.
>
>  The last 24 bytes are for the cache tree. An entry can be in an
>    invalidated state which is represented by having -1 in the entry_count
>    field. If an entry is in invalidated state, the next entry will begin
>    after the number of subtrees, and the 160-bit object name is dropped.
>
>  The entries are written out in the top-down, depth-first order. The
>    first entry represents the root level of the repository, followed by
>    the first subtree - let's call it A - of the root level, followed by
>    the first subtree of A, ...

Assume the command is "git diff -- path/to/h*", we don't need full
index, just stuff in "path/to/h*" from the index. I'm trying to see
how to load just those paths from index, not full index.

I assume again that you won't invent a new function and use
tree_entry_interesting() to do tree pruning while loading index.
t_e_i() is designed to read tree objects. But I think we can make it
read on-disk directory/file entries with a few small changes. t_e_i()
is recursive and fits quite well with depth-first directory layout in
the proposed index format.

I have difficulties figuring out how you skip subtrees though. Assume
we are at "path" and we are not interested in anything there until we
meet "path/to", how do you skip subtrees "path/abc" and "path/def"?
Processing directory entries sequentially will eventually get us to
"path/to", but that could be a lot of entries if "path/abc" is deep. A
file offset pointer to the next sibling directory entry might help.
Does such a pointer exist but I did not see it, or you have other
means to do this?

Also the file/dir separation makes it more difficult to match the last
"h*" part, if there are both "here" directory and "howto" file.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html