Re: Zookeeper instead of CLD in Hail

Jeff Darcy <jdarcy@xxxxxxxxxx> · Wed, 09 Jun 2010 10:35:29 -0400

On 06/09/2010 12:49 AM, Colin McCabe wrote:
> Well, Zookeeper is written in Java. Presumably it requires you to put
> a JVM on each node. Unfortunately the JVM has kind of a large memory
> footprint. That's great if your software is written in Java. If your
> company didn't go that route, ZK doesn't seem like such a great
> option.

I'm no fan of the Java "ecosystem" myself, and personally I do consider
that a drawback, but the fact remains that most of the people building
these sorts of applications are using Java or other VM-based languages
such as Ruby or Erlang and many do already have ZK running.  Whether
those are good choices or bad choices, they're common choices.

> The Chubby paper specifically calls out using Chubby for publish /
> subscribe as "abusive behavior." But ZooKeeper "is used at Yahoo! as
> the coordination and failure recovery service for Yahoo! Message
> Broker, which is a highly scalable publish-subscribe system" according
> to hadoop.apache.org.
> 
> Although they might be equivalent in some theoretical computer science
> sense, I get the impression that the two systems are very different
> beasts...

They're different, certainly, and CLD is different again, but they're
also all related.  It's also worth pointing out that YMB's use of ZK can
be considered proof that both ZK's semantics and its implementation are
sufficient for experienced domain experts to create production-level
systems (even at Yahoo's scale).  So far, there's no similar proof point
for CLD's semantics or implementation.

>> Nonetheless, those are features both share, and many might argue that
>> they're preferable to locks.  Locking is a fundamentally lousy way to
>> build scalable and reliable distributed systems, as has been well known
>> for more than a decade.
> 
> I think *fine-grained* locking is a fundamentally lousy way to build
> distributed systems. I haven't heard anyone argue that coarse-grained
> locking is bad.
> Have you read any interesting papers or books about this topic?

I think first we'd have to agree on a distinction between fine-grained
and coarse-grained.  ;)  The issue here is that you can do locking in ZK
quite easily.  It's just not very efficient, but you don't need it to be
efficient if the locks are coarse-grained anyway so the criticism about
ZK not having locks becomes quite meaningless.  Ephemeral nodes can be
used to synthesize the only kind of locks applications should be using,
and can also be used in other ways.  If you can do everything (that you
should be doing) in X that you can do in Y but not vice versa, and Y is
not provably faster or more reliable, then most developers would rightly
prefer X.

> However... It seems to me that once you make the decision to use
> ZooKeeper and Java, Walrus or Hadoop is probably a more practical
> choice for the upper layers.

Well, Hadoop is an entirely different kind of beast, and it might even
be worth exploring how Hail components might be used as Hadoop/MR data
sources or sinks.  Walrus is pretty directly comparable to tabled, but
mostly comes off worse in the comparison.  For one thing, it's even less
scalable.  A Java-based program would mostly not care about the
differences between tabled, Walrus, ParkPlace or Amazon S3 since they're
all behind the same HTTP-based protocol anyway.  (I have code that I
regularly test against tabled and Amazon without change.)  Where the
difference really becomes noticeable is not on the development side but
on the deployment side.  Installing chunkd and tabled now always always
always requires installing a third component as well, and since it's
supposed to be a highly available service that means care must be taken
to deploy it on physically separate machines etc. to avoid correlated
failures.  In a "green field" deployment of tabled/ZK there'd still be a
third component to install, but at least there's no DNS wart so there'd
be no need to negotiate with the people (often a separate group) who
control the local DNS.  More importantly, in environments where ZK has
already been deployed - and they're quite common - there'd be no need
for a third component, and no need to re-do all of that planning or
configuration.  Back on the development side, there'd also be no need to
deal with situations where only one of ZK and CLD had failed - and, no
matter how good we all think we are at developing and testing our code,
no responsible app developer can rule out such failures.

That said, Pete has rightly pointed out that switching chunkd/tabled to
ZK would require significant effort.  While I do believe there are
benefits, it's not clear they're great enough to justify that effort.
There are definitely other things - e.g. scalability/authentication
improvements, more testing - that even I would consider higher priorities.
--
To unsubscribe from this list: send the line "unsubscribe hail-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html