Re: Bigtime scaling of Postgresql (cluster and stuff I suppose)

Philip Hallstrom <postgresql@xxxxxxxxxxxxxxx> · Mon, 27 Aug 2007 14:28:00 -0700 (PDT)

Bill Moran <wmoran@xxxxxxxxxxxxxxxxx> writes:
First off, "clustering" is a word that is too vague to be useful, so
I'll stop using it.

Right.  MySQL Cluster, on the other hand, is a very specific technology.
http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster.html

It is, however, capable of being d*mn fast for read-mostly workloads
that can fit their whole dataset into RAM --- and with the price of

There are however some things that won't work (or work well) using NDB 
that will drive you crazy.

VARCHAR's aren't varchars.  They are fixed to the max length.  There's 
also a limit on overall row length which is pretty small (don't remember 
what it is off hand).  Cluster doesn't really enjoy processing queries 
with left outer joins or joins in general -- what will take <1s on a 
single mysql instance can take several seconds on the cluster.  Some of 
this is because the storage nodes can't do it so copy all the tables 
involved to the api nodes for processing.  Even on a fast network this 
takes a lot of time.  You can't have a query with two OR'd LIKE clauses. 
Instead you have to break them into their own query and UNION the result. 
You can't insert/update/delete more than 32000 rows at a time.  In 
practice (and no I don't understand why) sometimes this really means more 
like 10000.

Most annoying however is that to make a change to the database schema you 
have to shut down all the nodes except one.  Not sure if this is typical 
of other systems or not, but it kind of sucks :/

There's other things too, but I don't remember what they are until I build 
something that works fine with a single mysql instance and then doesn't on 
the cluster...

-philip

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq