Re: Index format v5

Thomas Rast <trast@xxxxxxxxxxxxxxx> · Thu, 3 May 2012 21:11:38 +0200

Junio C Hamano <gitster@xxxxxxxxx> writes:

>> [6] The length of the file name was dropped, since each file name is
>>     nul terminated anyway.
>
> This is micronit, but I think we do this to save one strlen() for each
> read of the entry, except for unusually long paths where we fall back to
> strlen(). A change like this needs to be justified better than simply
> saying "because we _could_ compute in a different way by spending extra
> cycles".

(Partially also answering the whole "what are these offsets" confusion)

The bisectability has evolved to the point where we envision the
structure of a "flat list" (the directories, and the files in each
directory) to have the format

  offset to entry 1 [1]
  ....
  offset to entry n

  entry 1, consisting of:
    name, nul-terminated
    rest of data:
      for dirs: cache-tree sha1, offset to files, etc.
      for files: stat data, content sha1, flags, etc.

That makes bisection very easy: the offsets point at the start of each
string, so you just strcmp() and get on with it.

On the other hand, by the time you can look at the flags, it's too late
for the strlen() optimization anyway, so meh.  If you think it's
important, we can perhaps lay it out so the rest of the data goes
immediately after the pointer.  But so far it wasn't clear whether it's
fixed-size, or uses some smart compression scheme.  Tonight's edition
has a fixed length, so perhaps it would be preferable to keep the
length.

Footnotes: 
[1]  It probably doesn't matter whether this is relative to the position
of the offset, or absolute (in terms of file pointer).

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html