Re: Index format v5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 






On 05/06, Nguyen Thai Ngoc Duy wrote:
> On Fri, May 4, 2012 at 12:25 AM, Thomas Gummerer <t.gummerer@xxxxxxxxx> wrote:
> > == Directory entry
> >
> >  Directory entries are sorted in lexicographic order by the name
> >  of their path starting with the root.
> >
> >  Path names (variable length) relative to top level directory (without the
> >    leading slash). '/' is used as path separator. '.' indicates the root
> >    directory. The special patch components ".." and ".git" (without quotes)
> >    are disallowed. Trailing slash is also disallowed.
> >
> >  1 nul byte to terminate the path.
> >
> >  32-bit offset to the first file of a directory
> >
> >  32-bit offset to conflicted/resolved data at the end of the index.
> >    0 if there is no such data. [4]
> 
> If it's non-zero, how do we know how many conflict entries we have?

Right, I'll add this to the next version. 

(32-bit number of conflicted/resolved data entries at the end of the
  index if the offset is non 0.)

> >  4-byte number of subtrees this tree has
> 
> let's name this nr_subtrees
> 
> >  4-byte number of entries in the index that is covered by the tree this
> >    entry represents. (entry_count) (-1 if the entry is invalid)
> 
> and this nr_entries.
> 
> So how do we know how many entries (including all dirs, files, staged
> files) this directory has? I assume if enry_count != -1, the number
> would be nr_subtrees + nr_entries (or just nr_entries, depending on
> your definition). When entry_count == -1, how do we calculate this
> number?

You're right. In order to maintain the bisectability we'll have to
keep the entry count up to date.

> >  160-bit object name for the object that would result from writing
> >    this span of index as a tree.
> >
> >  The last 24 bytes are for the cache tree. An entry can be in an
> >    invalidated state which is represented by having -1 in the entry_count
> >    field. If an entry is in invalidated state, the next entry will begin
> >    after the number of subtrees, and the 160-bit object name is dropped.
> >
> >  The entries are written out in the top-down, depth-first order. The
> >    first entry represents the root level of the repository, followed by
> >    the first subtree - let's call it A - of the root level, followed by
> >    the first subtree of A, ...
> 
> Assume the command is "git diff -- path/to/h*", we don't need full
> index, just stuff in "path/to/h*" from the index. I'm trying to see
> how to load just those paths from index, not full index.
> 
> I assume again that you won't invent a new function and use
> tree_entry_interesting() to do tree pruning while loading index.
> t_e_i() is designed to read tree objects. But I think we can make it
> read on-disk directory/file entries with a few small changes. t_e_i()
> is recursive and fits quite well with depth-first directory layout in
> the proposed index format.
> 
> I have difficulties figuring out how you skip subtrees though. Assume
> we are at "path" and we are not interested in anything there until we
> meet "path/to", how do you skip subtrees "path/abc" and "path/def"?
> Processing directory entries sequentially will eventually get us to
> "path/to", but that could be a lot of entries if "path/abc" is deep. A
> file offset pointer to the next sibling directory entry might help.
> Does such a pointer exist but I did not see it, or you have other
> means to do this?

I have changed the index format slightly, not to use prefix compression
on the directory entries, so that binary search through the index gets
simple. Using the directory offsets, we can binary search to the
path/to. We always have log(n) (n is the number of directories) search
time for each path.

> Also the file/dir separation makes it more difficult to match the last
> "h*" part, if there are both "here" directory and "howto" file.

It doesn't make it more difficult to find, since we can first just
check if there exists the directory with a binary search (and
possibly more directories, around it) and then search for the file
in the superdirectory (can you say that?) with another binary search.

-- 
Thomas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]