Johannes Schindelin wrote:
Hi,
On Sun, 26 Apr 2009, A Large Angry SCM wrote:
Johannes Schindelin wrote:
On Sun, 26 Apr 2009, A Large Angry SCM wrote:
Another important criteria was which, both or neither of Git and Hg
would actually work and perform well on top of Google Code's
underling storage system and except to mention they would be using
Bigtable, the report did not discuss this. Git on top of Bigtable
will not perform well.
Actually, did we not arrive at the conclusion that it could perform
well at least with the filesystem layer on top of big table, but even
better if the big tables stored certain chunks (not really all that
different from the chunks needed for mirror-sync!)?
Back when I discussed this with a Googler, it was all too obvious that
they are not interested (and in the meantime I understand why, see my
other mail).
I don't remember the mirror-sync discussion. But I do remember that when
the discussion turned to implementing a filesystem on top of Bigtable
that would not cause performance problems for Git, my response was that
you'd still be much better off going to GFS directly instead of faking a
filesystem on top of Bigtable without all of the Bigtable limitations.
Umm, GFS is built on top of Bigtable, no?
Other way around.
Bigtable _is_ appealing to implement the Git object store on. It's too
bad the latency in Bigtable would make it horribly slow.
If you store one object per Bigtable, yes. If you store a few undelta'd
objects there, and then use the pack run to optimize those tables, I think
it would not be horribly slow. Of course, you'd need to do exactly the
same optimizations necessary for mirror-sync, but I might have mentioned
that already ;-)
But now you have to find where you stored those "few undelta'd objects"
and then go get the object you're interested in. The only way you can
win with that scheme is if you can find groups of objects that are
(almost) always accessed together, for all objects (and still not get
tripped up by the other limitations of Bigtable).
One method would be to group all of the commit objects into one BT entry
and then create a BT entry for each commit that contains all the trees
and blobs. This may be fast enough for some operations but would cause
the storage requirements to explode.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html