HA and Replication - how to choose among all the available solutions

Pascal Cohen <pcohen@xxxxxxxxx> · Fri, 30 Nov 2007 22:30:47 +0100

Hello

I am facing a probably very common problem. I made a search in the 
recent archives and could find many posts related to my issue. But I did 
not get exactly "the answer" to my question.
How can I chose among the different existing tools to enable High 
Availability and Load-balancing/Replication.
In fact our situation is common. We have a DB that is growing and we 
need to ensure High-availability.
That needs some standby solution.
On the other hand we have few updates and many reads and when we are 
trying some simulations, the number of requests may become too high for 
our server (I mean requests fast - less than 4ms with explain run- 
starts to be logged with a threshold to 200ms) so we have thought to 
replication to balance the request load among several machines. We can 
afford some delay among the DB (which means sync replication is welcome 
but not mandatory and should probably be avoided if we want to improve 
the perfs).
I had mainly a look at several solutions like:
PGCluster - but I could not get some fresh stuff
Slony-I - seems mature and used
PgPool-II - active and interesting project
PostgreSQL-R - was not able to test it.
C-JDBC/Sequoia.

I mainly focused on PgPool-II and Slony which seemed to be active projects.
I also looked at the Log-Shipping section in PostgreSQL documentation to 
ensure high-availability.

Concerning the high-availability issue, I can in fact imagine using 
log-shipping, or Slony-I or PgPool-II in multi-master mode with only one 
server available.
But I don't know how to chose a solution. I don't have much experience 
and can't see precisely the advantages and the drawbacks of each solution.

Concerning Scalability, I think I would need PgPool-II connected in 
front to at least load balance select to the different servers. But I 
can use the "multi-master" solution or rely on Slony to replicate data 
while PgPool balances the select and only sends updates to the Master. 
In fact that last solution makes me more comfortable because I fear that 
multi-master can lead to complex conflict resolution while a single 
master avoids such concern.
With a single master I can't have any data mismatch case I was able to 
simulate with PgPool-II (inserting manually in a DB a duplicate value or 
adding a trigger to simulate an exception).
What is the experience concerning that kind of question?

Mixing both Scalability and High-availability: with PgPool-II running in 
"multi-master" solution, no real issue. With single master it is less 
obvious :(
I had imagined a mixed solution using a Master relying on a warm standby 
using log-shipping and Linux-HB while Replicated slaves would be 
replicated using Slony and load balancing would be managed by PgPool-II. 
But perhaps it is stupid and using only PgPool-II is safe enough.
And why should I chose Slony instead of Log-shipping to update the slaves ?

The hardest in the situation is to make the (or at least a) good choice 
among all the available solutions and to chose a realistic one

That are many questions in this quite long post.
Thanks for any help or advice you could give me.

Pascal

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster