Re: [PATCH 12/23] pack v4: creation code

Nicolas Pitre <nico@xxxxxxxxxxx> · Tue, 27 Aug 2013 12:59:15 -0400 (EDT)

On Tue, 27 Aug 2013, Junio C Hamano wrote:

> Nicolas Pitre <nico@xxxxxxxxxxx> writes:
> 
> > Let's actually open the destination pack file and write the header and
> > the tables.
> >
> > The header isn't much different from pack v3, except for the pack version
> > number of course.
> >
> > The first table is the sorted SHA1 table normally found in the pack index
> > file.  With pack v4 we write this table in the main pack file instead as
> > it is index referenced by subsequent objects in the pack.  Doing so has
> > many advantages:
> >
> > - The SHA1 references used to be duplicated on disk: once in the pack
> >   index file, and then at least once or more within commit and tree
> >   objects referencing them.  The only SHA1 which is not being listed more
> >   than once this way is the one for a branch tip commit object and those
> >   are normally very few.  Now all that SHA1 data is represented only once.
> >
> 
> This tickles my curiosity. Why isn't this SHA-1 table sorted by
> reference count the same way as the tree path and the people name
> tables to keep the average length of varint references short?

Doing so allows for the SHA1 index used in objects to be used directly 
for lookups into the pack index in order to know immediately the 
location of the referenced object bypassing the binary search.  
Furthermore, SHA1 references are rather evenly spread across the whole 
table.  Only tree objects may share the same SHA1 references repeatedly 
across multiple objects, and those are likely to end up being deltas 
against each other.

Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html