Re: Calculating tree nodes

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 03 Sep 2007 21:28:41 -0700

"Jon Smirl" <jonsmirl@xxxxxxxxx> writes:

>> Yes.  For performance reasons, since a simple commit would kill you in any
>> reasonably sized repo.
>
> That's not an obvious conclusion. A new commit is just a series of
> edits to the previous commit. Start with the previous commit, edit it,
> delta it and store it. Storing of the file objects is the same. Why
> isn't this scheme fast than the current one?

I think you seem to be forgetting about tree comparison.

With a large project that has a reasonable directory structure
(i.e. not insanely narrow), a commit touches isolated subparts
of the whole tree.  Think of an architecture specific patch to
the Linux kernel touching only include/asm-i386 and arch/i386
directories.

Being able to cull an entire subdirectory (e.g. drivers/ which
has 5700 files underneath) by only looking at the tree SHA-1 of
the containing tree is a _HUGE_ win.

And this is not just about two tree comparison.  When you say:

	git log v2.6.20 -- arch/i386/

what you are seeing is a simplified history that consists of
commits that touch only these paths.  How would we determine if
a commit touch these paths efficiently?  By comparing the "i386"
entry in tree objects for $commit^:arch and $commit:arch.  You
do not have to look inside arch/i386/ trees to see if any of the
330 files in it is different.  You just check a single SHA-1
pair.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html