Re: Google Code: Support for Mercurial and Analysis of Git and Mercurial

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Schindelin wrote:
Hi,

On Sun, 26 Apr 2009, A Large Angry SCM wrote:

Johannes Schindelin wrote:

On Sun, 26 Apr 2009, A Large Angry SCM wrote:

Another important criteria was which, both or neither of Git and Hg would actually work and perform well on top of Google Code's underling storage system and except to mention they would be using Bigtable, the report did not discuss this. Git on top of Bigtable will not perform well.
Actually, did we not arrive at the conclusion that it could perform well at least with the filesystem layer on top of big table, but even better if the big tables stored certain chunks (not really all that different from the chunks needed for mirror-sync!)?

Back when I discussed this with a Googler, it was all too obvious that they are not interested (and in the meantime I understand why, see my other mail).
I don't remember the mirror-sync discussion. But I do remember that when the discussion turned to implementing a filesystem on top of Bigtable that would not cause performance problems for Git, my response was that you'd still be much better off going to GFS directly instead of faking a filesystem on top of Bigtable without all of the Bigtable limitations.

Umm, GFS is built on top of Bigtable, no?

Other way around.

Bigtable _is_ appealing to implement the Git object store on. It's too bad the latency in Bigtable would make it horribly slow.

If you store one object per Bigtable, yes. If you store a few undelta'd objects there, and then use the pack run to optimize those tables, I think it would not be horribly slow. Of course, you'd need to do exactly the same optimizations necessary for mirror-sync, but I might have mentioned that already ;-)

But now you have to find where you stored those "few undelta'd objects" and then go get the object you're interested in. The only way you can win with that scheme is if you can find groups of objects that are (almost) always accessed together, for all objects (and still not get tripped up by the other limitations of Bigtable).

One method would be to group all of the commit objects into one BT entry and then create a BT entry for each commit that contains all the trees and blobs. This may be fast enough for some operations but would cause the storage requirements to explode.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]