Re: Google Code: Support for Mercurial and Analysis of Git and Mercurial

A Large Angry SCM <gitzilla@xxxxxxxxx> · Sun, 26 Apr 2009 14:00:24 -0400

Johannes Schindelin wrote:
Hi,

On Sun, 26 Apr 2009, A Large Angry SCM wrote:

Johannes Schindelin wrote:

On Sun, 26 Apr 2009, A Large Angry SCM wrote:

Another important criteria was which, both or neither of Git and Hg 
would actually work and perform well on top of Google Code's 
underling storage system and except to mention they would be using 
Bigtable, the report did not discuss this. Git on top of Bigtable 
will not perform well.
Actually, did we not arrive at the conclusion that it could perform 
well at least with the filesystem layer on top of big table, but even 
better if the big tables stored certain chunks (not really all that 
different from the chunks needed for mirror-sync!)?

Back when I discussed this with a Googler, it was all too obvious that 
they are not interested (and in the meantime I understand why, see my 
other mail).
I don't remember the mirror-sync discussion. But I do remember that when 
the discussion turned to implementing a filesystem on top of Bigtable 
that would not cause performance problems for Git, my response was that 
you'd still be much better off going to GFS directly instead of faking a 
filesystem on top of Bigtable without all of the Bigtable limitations.

Umm, GFS is built on top of Bigtable, no?

Other way around.

Bigtable _is_ appealing to implement the Git object store on. It's too 
bad the latency in Bigtable would make it horribly slow.

If you store one object per Bigtable, yes.  If you store a few undelta'd 
objects there, and then use the pack run to optimize those tables, I think 
it would not be horribly slow.  Of course, you'd need to do exactly the 
same optimizations necessary for mirror-sync, but I might have mentioned 
that already ;-)

But now you have to find where you stored those "few undelta'd objects" 
and then go get the object you're interested in. The only way you can 
win with that scheme is if you can find groups of objects that are 
(almost) always accessed together, for all objects (and still not get 
tripped up by the other limitations of Bigtable).

One method would be to group all of the commit objects into one BT entry 
and then create a BT entry for each commit that contains all the trees 
and blobs. This may be fast enough for some operations but would cause 
the storage requirements to explode.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html