Re: [PATCH/RFC] index-pack: produce pack index version 3

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 13 Aug 2012 00:40:55 -0700

Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes:

> On Mon, Aug 13, 2012 at 2:49 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> For example, the reachability bitmap would want to say something
>> like "Traversing from commit A, these objects in this pack are
>> reachable."  The bitmap for one commit A would logically consist of
>> N bits for a packfile that stores N objects (the resulting bitmap
>> needs to be compressed before going to disk, perhaps with RLE or
>> something).  With the single "sorted by SHA-1" table, we can use the
>> index in that single table to enumerate all reachable objects of any
>> type in one go.  With four separate tables, on the other hand, we
>> would need four bitmaps per commit.
>
> No we still need one per commit. The n-th bit is in the order of the
> object in the pack, not the index. How sha-1 is sorted does not
> matter.

Then what is the problem of using that same "N" that counts the
object using the order of the objects in the pack as a short-hand to
the object name instead, if your objective is to avoid having to
repeat the full 20-byte object name in your secondary tables?  That
does not call for any split in the list of SHA-1s, no?

>> Either way is _possible_, but I think the former is simpler, and the
>> latter makes it harder to introduce new types of objects in the
>> future, which I do not think we have examined possible use cases
>> well enough to make that decision to say "four types is enough
>> forever".
>
> New types can be put in one of those four tables, depending on its
> purpose. The reason I split because I care particularly about commits
> and trees. If the new type serves the same purpose as tree, for
> example, then it's better put in tree table...

And when there are multiple uses, one that wants to treat trees and
commits more or less the same way, and another that wants to treat
them in a distinct way, even without considering a new type, how
would your "we have four separate tables" help?  The former user
wants three "commits+trees", "tags" and "blobs" table while the
latter wants four, no?

>> In either way, we would have such bitmap (or a set of four bitmaps
>> in your case) for more than one commit (it is not necessary or
>> desirable to add the reachability bitmap to all commits), and such a
>> "reachability extension" would need to store a sequence of "the
>> commit object name the bitmap (or a set of four bitmaps) is about,
>> and the bitmap (or set of four bitmaps)".  That object name does not
>> have to be 20-byte but would be a varint representation of the
>> offset into the "sorted by SHA-1" table.
>
> How do you reach the bitmap, given its commit sha-1?

Look at the ordered list of SHA-1 in the pack idx file as an array
of 20-byte entries, and find the commit SHA-1 in that array by
Newton-Raphson/bisection.  You have an index into the array that
holds the object name.  You can use it as the local object number
in the packfile.  Then you can find that object number (that is
shorter than 20-byte object name) in your secondary table, no?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html