Re: Bigtime scaling of Postgresql (cluster and stuff I suppose)

Bill Moran <wmoran@xxxxxxxxxxxxxxxxx> · Sat, 25 Aug 2007 10:18:25 -0400

"Phoenix Kiula" <phoenix.kiula@xxxxxxxxx> wrote:
>
> We're moving from MySQL to PG, a move I am rather enjoying, but we're
> currently running both databases. As we web-enable our financial
> services in fifteen countries, I would like to recommend the team that
> we move entirely to PG.
> 
> In doing research on big installations of the two databases, I read
> this from a MySQL senior exec on Slashdot:

Senior MySQL exec means this is a marketing blurb, which means it's
exaggerated, lacking any honest assessment of challenges and difficulties,
and possibly an outright lie.  I've no doubt that MySQL can do clusters
if you know what you're doing, but if you want the facts, you're going
to have to look deeper than that obviously biased quote.  I seem to
remember a forum thread with someone having considerable difficulty
with MySQL cluster, and actual MySQL employees jumping in to try to
help and no solution ever found.  Anyone have that link lying around?

In any event, replication is a large and complex topic.  To do it well
takes research, planning, and know-how.  Anyone who tells you their
solution will just drop in and work is either lying or charging you
a bunch of money for their consultants to investigate your scenario
and set it up for you.

First off, "clustering" is a word that is too vague to be useful, so
I'll stop using it.  There's multi-master replication, where every
database is read-write, then there's master-slave replication, where
only one server is read-write and the rest are read-only.  You can
add failover capabilities to master-slave replication.  Then there's
synchronous replication, where all servers are guaranteed to get
updates at the same time.  And asynchronous replication, where other
servers may take a while to get updates.  These descriptions aren't
really specific to PostgreSQL -- every database replication system
has to make design decisions about which approaches to support.

PostgreSQL has some built-in features to allow synchronous multi-master
database replication.  Two-phase commit allows you to reliably commit
transactions to multiple servers concurrently, but it requires support
at the application level, which will require you to rewrite any existing
applications.

Pgcluster is multi-master synchronous replication, but I believe it's
still in beta.

Note that no synchronous replication system works well over
geographically large distances.  The time required for the masters
to synchronize over (for example) the Internet kills performance to
the point of uselessness.  Again, this is not a PostgreSQL problem,
MSSQL suffers the same problem.  Synchronous replication is only
really used when two servers are right next to each other with a
high-speed link (probably gigabit) between them.

PostgreSQL-R is in development, and targeted to allow multi-master,
asynchronous replication without rewriting your application.  As
far as I know, it works, but it's still beta.

pgpool supports multi-master synchronous replication as well as
failover.

Slony supports master-slave asynchronous replication and works _very_
well over long distances (such as from an east coast to a west coast
datacenter)

Once you've looked at your requirements, start looking at the tool
that matches those requirements, and I think you'll find what you
need.

BTW: does anyone know of a link that describes these high-level concepts?
If not, I think I'll write this up formally and post it.

-- 
Bill Moran
http://www.potentialtech.com

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
       message can get through to the mailing list cleanly