Scott Chacon <schacon@xxxxxxxxx> wrote: > Has anyone watched this yet? > > http://code.google.com/events/io/sessions/MercurialBigTable.html I hadn't seen that yet, thanks. > It's kind of interesting - a Googler talks about getting Mercurial > running on BigTable. What fascinates me is that if I'm not horribly > mistaken, it seems like they just threw out the revlog format entirely > and just store the data in a key-value store as sort of a Git-like > content addressable filesystem. Almost... but not quite. If you look at the way they store files they embed the file path as part of the BigTable key. This makes it cheap to return all revisions between X and Y for any given file, as its just a range scan over the keys. Git doesn't do this normally. In Hg, and in their implementation of it on BigTable, if a file content is copied between two paths (same blob in git terms) they actually duplicate the data, once under each path. We could do something like that in Git... and just pay the price on copy, and then you can get a storage layout like they do, and have it scale well onto a larger system. But... pack size will suffer in what the client receives, it will be bigger. > Does anyone know how they do the graph walking efficiently with this > structure? He mentioned it was about half as fast as native Hg, but > that seemed to be acceptable. Curious if anyone had any thoughts or > information on this. Shawn, are there technical reasons why this > works well the way they're doing it for Hg but would not for Git (like > in the repo MINA based server)? It looks like the data structure and > protocol exchange are incredibly similar after they threw away all the > revlog stuff. I think they also added more pointers and data caches that don't exist in Hg normally, but exist in their BigTable backend. Like precomputing pointers from a commit to the most recent ancestor that is a merge, i think that was mentioned in the talk. The JGit/MINA based servers run git "well enough", but that's off local disk, and we do pay a good price compared to C Git. E.g. we really need a revcache to accelerate the object enumeration phase, that takes ages in JGit. And indexing a pushed pack is rather slow compared to C Git, a large push could take up to a minute or two to fully index and fsck. > Or is it just that they're fine with the speed loss and > the Android project would not be? What does Android have to do with Hg? Android went with Git for a lot of reasons, none of them having to do with the performance or availability of Hg on code.google.com. All of them had to do with Git being a really solid DVCS that has a very bright future. -- Shawn. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html