Re: Calculating tree nodes

"Martin Langhoff" <martin.langhoff@xxxxxxxxx> · Tue, 4 Sep 2007 16:21:47 +1200

On 9/4/07, Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> > Yes.  For performance reasons, since a simple commit would kill you in any
> > reasonably sized repo.
>
> That's not an obvious conclusion. A new commit is just a series of

Hi Jon!

If you search the archives you'll find Linus explaining that the
initial git had all the directory structure in one single "tree"
object that held all the paths, not matter how deep. The problem with
that was taht every commit generated a huge new tree object, so he
switched to the current "nested trees" structure, which also has the
nice feature of speeding up diffs/merges if whole subtrees haven't
changed.

> edits to the previous commit. Start with the previous commit, edit it,
> delta it and store it. Storing of the file objects is the same. Why
> isn't this scheme fast than the current one?

I think you're a bit confused between 2 different things:

 - git is _snapshot_ based, so every commit-tree-blob set is
completely independent. The "canonical" storage is each of those
gzipped in .git/objects
 - however, for performance and on-disk-footprint, we delta them (very
efficiently I hear)

So if you ask the GIT APIs about a tree, you end up dealing with the
nested trees I describe. Similarly, if you ask for a blob, you get the
blob. But internally git _is_ delta-compressing them.

It's not compressing them immediately -- only when you run git gc. But
from an API perspective, you don't have to worry about that.

HTH

martin
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html