Hi, guys: I spent a few days playing with Zookeeper, with an eye on replacing CLD with it. The short recommendation: don't do it, at least for now, but reconsider if any sister services make a good use of it (e.g. if MRG/DC image store does). The easiest way to replace CLD would be to use Zookeeper as if it were CLD, so I wrote a test that locked a file like cldu.c does now. It was not too bad, but I learned two things: - what exactly Garzik was saying about "different focus" in Q&As after his presentations, and - locking anything is a really retarded thing to do in Zookeeper. About the focus, ZK is just like CLD from a certain angle (it has the good old files and provides a set of un-posixy operations on them: watches, uniques, "ephemerals"), but it's also entirely unlike CLD (e.g. no locks in the protocol). CLD's model is that clients are daemons, each of which reads a few of its files, maybe locks one or two at boot, and then nothing happens except keepalives. Zookeeper's model... honestly I don't know what it is because it's never explained concisely, but the docs that I saw seem to imply huge numbers of clients all doing random ops all the time on the same files, enough to cause a herd concerns. It looks like Yahoo may be using Zookeeper as a lease manager or something. Crazy. I heard people say they cribbed from the same Chubby paper, but it's bollocks. It's absolutely nothing like what Chubby implies. No locks for one thing. To be sure, Zookeeper provides a canned piece of code which implements locks, kinda like you can implement compare-and-swap using Dekker's algorithm on a CPU that doesn't have it. The canned lock creates "sequenced" files (using a ZK server call that creates unique filenames), then sets some "watches" (same as CLD offers), then re-reads the directory to find the lowest number sequential file, which is the winner of the lock. Haha, only serious. I tested it, it works, but ewwwww. They clearly want daemons to approach the whole problem in a different way. For example, there's a similar canned recipy to identify a "leader" client. Overall, ZK seems like a mature, if quirky system. Quirky means that I made my client OOM hard by using wrong compilation options, and it took me a while to figure it out (PROTIP: do not use "single-threaded" mode in Zookeeper, it is not loved and canned recipies may plain not work with it). There were some other weird stories. But it definitely works. Unfortunately, with the latest fix for the timer CLD works too: I've not seen a server crash in a couple of months. So I do not see an upside for us to switch at this point, and I have better things to do than learning Zookeeper ropes for weeks. BTW, Zookeeper is not packaged in Fedora. You have to install it by hand. Thank heavens for /usr/local. Dunno what their community is like. I'm going to send a trivial patch to them and see what happens. -- Pete -- To unsubscribe from this list: send the line "unsubscribe hail-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html