Re: Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sunday 03 December 2006 03:46, Shawn Pearce wrote:
> Josef Weidendorfer <Josef.Weidendorfer@xxxxxx> wrote:
> > Thinking even one step further:
> > Would it make sense to define an encoding format for the content of
> > commit and tree objects inside of packs, where the SHA1 is replaced by the
> > offset of the object in this pack?
> > As exactly the SHA1 is the least compressable thing, this could promise
> > quite a benefit.
> [...]
> 
> This means that when we start to write out a commit we need to know
> the offset to the tree that commit references.  But git-pack-objects
> sorts object by type: commit, tree, blob (I forget where tags go,
> but they aren't important in this context).  So generally *all*
> commits appear before the first tree.  So when we write out the first
> commit we need to know exactly how many bytes every commit will need
> (compressed mind you) in this pack so we can determine the position
> of the first tree.  Now do this for every commit and every tree
> that those commits use...  yes, its a lot of work to precompute
> and store all offsets before you even write out the first byte.

Yes, it looks like a hen-and-egg problem, but IMHO you can
handle it nicely with another redirection, i.e. a table you build
up while repacking the file, and storing this table at the end.

You simply sequentially renumber any object SHA, starting from 0
in the order you see them. You can do two renumberings, one for
the objects contained in the original pack (1), and one for the
external ones (2). Put these new numbers (with a bit distinguishing
(1) and (2)) as replacement into commit/tree objects.
At the end, you have the new offsets for objects in (1). Put
redirection tables for (1) [new number -> new offset]
and (2) [other new number->SHA1 of external object] at the end
of the new pack.
This way, you effectivly have removed all incompressable SHAs from
the pack file aside from one entry in the redirection tables for
each external object.

The only problem I see is how to decode the objects, i.e. how to
get the original SHA1 from an offset: we can not recalculate the
SHA1 from the object content as we changed the content itself.
But there should be a way to store the SHA1 in front of the object
somehow, perhaps it is already given by the current format? 

Am I missing something here?

Josef
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]