Re: libgit2 - a true git library

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Sat, 1 Nov 2008 18:50:41 -0700

Andreas Ericsson <ae@xxxxxx> wrote:
> Shawn O. Pearce wrote:
>>
>> Eh, I disagree here.  In git.git today "struct commit" exposes its
>> buffer with the canonical commit encoding.  Having that visible
>> wrecks what Nico and I were thinking about doing with pack v4 and
>> encoding commits in a non-canonical format when stored in packs.
>> Ditto with trees.
>
> Err... isn't that backwards?

No.

> Surely you want to store stuff in the
> canonical format so you're forced to do as few translations as
> possible?

No.  We suspect that canonical format is harder to decompress and
parse during revision traversal.  Other encodings in the pack file
may produce much faster runtime performance, and reduce page faults
(due to smaller pack sizes).

We hardly ever use the canonical format for actual output; most
output rips the canonical format apart and then formats the data
the way it was requested.  If we have the data *already* parsed in
the pack its much faster to output.

> Or are you trying to speed up packing by skipping the
> canonicalization part?

Wrong; we're trying to speed up reading.  Packing may go slower,
especially during the first conversion of v2->v4 for any given
repository, but packing is infrequent so the minor (if any) drop
in performance here is probably worth the reading performance gains.

> Well, if macro usage is adhered to one wouldn't have to worry,
> since the macro can just be rewritten with a function later (if,
> for example, translation or some such happens to be required).
> Older code linking to a newer library would work (assuming the
> size of the commit object doesn't change anyway),

You are assuming too much magic.  If the older ABI used a macro
and the newer one (which supports pack v4) organized struct commit
differently and the user upgrades libgit2.so the older applications
just broke, horribly.

We know we want to do pack v4 in the near future.  Or at least
experiment with it and see if it works.  If it does, we don't
want to have to cause a major ABI breakage across all those newly
installed libgit2s... yikes.

I'm really in favor of accessor functions for the first version of
the library.  They can always be converted to macros once someone
shows that their git visualizer program saves 10 ms on a 8,000 ms
render operation by avoiding accessor functions.  I'd rather spend
our brain cycles optimizing the runtime and the in-core data so
we spend less time in our tight revision traversal loops.

Seriously.  We make at least 10 or 11 function calls *per commit*
that comes out of get_revision().  If the formatting application is
really suffering from its 4 or 5 accessor function calls in order
to get that returned data, we probably should also be looking at
how we can avoid function cals in the library.

Oh, and even with 4 or 5 accessor functions per commit in the
application that is *still* better than the 10 or so calls the
application probably makes today scraping "git log --format=raw"
off a pipe and segment it into the different fields it needs.

Unless pipes in Linux somehow allow negative time warping with
CPU counters.  Though on dual-core systems they might, since the
two processes can run on different cores.  But oh, you didn't want
to worry about threading support too much in libgit2, so I guess
you also don't want to use multi-core systems.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html