Scott Chacon wrote:
Has anyone watched this yet? http://code.google.com/events/io/sessions/MercurialBigTable.html It's kind of interesting - a Googler talks about getting Mercurial running on BigTable. What fascinates me is that if I'm not horribly mistaken, it seems like they just threw out the revlog format entirely and just store the data in a key-value store as sort of a Git-like content addressable filesystem.
It does indeed seem like that, yes. Would have been fun to be there to congratulate him on implementing something that's already existed for about three years ;-)
I had thought they were taking advantage of the revlog structure somehow, but it appears like they basically just changed the underlying data format to be much more like Git and rewrote ah Hg speaking server on top of that. They even explicitly store the head values like refs instead of reading childless nodes out of the revlog, which is what I thought Hg did.
Well, storing the head values as refs is the only thing that makes sense if you're using a database to track things, since you'd otherwise have to map in too much data to get any sort of performance at all out of it.
Does anyone know how they do the graph walking efficiently with this structure? He mentioned it was about half as fast as native Hg, but that seemed to be acceptable.
Yes, so they don't. DAG walking means they have to look up several changesets in a linear fashion, but if they don't know the order up front they'll have to suffer the penalty of actually fetching each commit from the bigtable database over the network. It would be similar to storing git objects in a database on a different host, which would also be quite a lot slower than just hitting an mmap()'ed file in binary form.
Curious if anyone had any thoughts or information on this. Shawn, are there technical reasons why this works well the way they're doing it for Hg but would not for Git (like in the repo MINA based server)? It looks like the data structure and protocol exchange are incredibly similar after they threw away all the revlog stuff. Or is it just that they're fine with the speed loss and the Android project would not be?
I'm more curious as to why they didn't choose git. The only explanation that was actually true is that hg works well over HTTP (if you can call 3 network requests per not-up-to-date head "well"). Since I can't imagine them not doing proper research before launching a project that almost certainly cost quite a lot of money, and I personally think that the "http rules all" explanation sounded weak, I'm guessing there were other reasons as to why they didn't go with git instead, and I'm fairly curious to hear them. If I was to take a guess, I'd say git is written in a pretty unfriendly way for implementing other storage engines. Ah well. In a year or two they'll probably support git as well. One can hope at least ;-) -- Andreas Ericsson andreas.ericsson@xxxxxx OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html