Re: Replication

Scott Marlowe <smarlowe@xxxxxxxxxxxxxxxxx> · Tue, 13 Sep 2005 17:01:10 -0500

On Tue, 2005-09-13 at 15:51, Russ Brown wrote:
> Scott Marlowe wrote:

> > Let me present it as a simple devil's choice, which would you rather
> > have, proven replication, that works, but requires you to setup a
> > secondary bit of software / system scripts (like rsync) but is tested
> > and proven to work, or, an "out of the box" solution that is untested,
> > unreliable, and possible unsafe for your data?
> > 
> 
> Why does an "out of the box" solution have to be untested, unreliable 
> and possibly unsafe for my data?

It doesn't have to be.  My main point was that many people assume that
since it's included and works out of the box that it MUST be tested and
reliable.  MySQL's replication is not particularly reliable.  As someone
who's done some "pull the plug" testing on it and PostgreSQL I can
confidently say that PostgreSQL and it's replication can survive fairly
nasty power / network failure modes, and MySQL's replication and db
engine seem much more prone to problems caused by such scenarios.  So,
to me, it's "tested" but most definitely not proven reliable.

> How about a third choice: you can also use a proven, reliable and tested 
>   replication solution that is included in the core system because the 
> core system basiclly provides it anyway. It's easy to set up, but (as 
> with all replication solutions) doesn't fit all purposes.

If PostgreSQL provided it, I'd use it.  But, I'd rather pick a
replication engine that I know works, having tested it, and wait for it
to be included in the back end at some future date.  

> Depending on my requirements, I may well choose that one.

Since my first requirement is reliable replication, MySQL's replication
is out.

> > Chosing a database because it has "out of the box" replication without
> > paying attention to how it is implemented, how well it works, and what
> > are the ways it can break is a recipe for (data) disaster.
> > 
> 
> Absolutely, but you're preaching to the converted here. I'm not 
> discussing which database I would choose. I was simply asking whether 
> the WAL logs would be utilised to create a simple replication solution 
> "Out of the box" with very little additional work on the part of the 
> developers.

I think it's getting there.  But like so many things PostgreSQL, it
won't be put into the core until it's considered mature and stable.

>  I am fully aware that no single replication 
> solution will fit all circumstances, but if PostgreSQL's core 
> functionality can provide one such solution anyway, why not add a couple 
> of scripts to make it all work out of the box and advertise the fact?

Because it's probably system dependent, i.e. the scripts / C programs
need to work on all the platforms supported by postgresql before it
would get included.  If it only worked on Linux and / or BSD, it's not
done.

> > That said, it's a great system for content management replication, where
> > downtime is fine while setting up replication.
> > 
> 
> Precicely: one situation where this solution will work well, but nobody 
> knows about it because it's not pushed as a feature.

Actually I was speaking of MySQL's replication there, in case that
wasn't clear (not sure if it was or not...)

Keep in mind, PITR just came out in release 6 months ago.  It's not
really tested thoroughly, and the ways to set it up are probably varied
enough that there's no one way that's considered "Standard" just yet.

> > But I wouldn't choose either because it was easier to implement.  Being
> > easy to implement is just sauce on the turkey.  I need the meat to be
> > good or the sauce doesn't matter.
> > 
> 
> Indeed. As said above, it has to be reliable, tested etc. I personally 
> fear a lot of the things our MySQL databases do to our data, and would 
> very much like to make the switch to PostgreSQL if allowed to do so. 
> There may be problems with MySQL's 'out of the box' replication 
> solution, but that doesn't mean that having one at all is a Bad Thing.

It's not just what MySQL's db engine is willing to do to your data, it's
how its replication engine may fail and you only find out months after
the fact when the primary goes down that all your data is 6 months or so
out of date.

> I can't help but think that had I just asked about the possibility of 
> using the WAL log idea for an 'out of the box' replication solution 
> without mentioning the word 'MySQL', the responses received would have 
> been very different.

Hmmmm.  I would imagine you would have gotten different responses as
well too.  Using WAL logs for replication is one subject, MySQL
replication is another.  PITR is a new feature that's still be fleshed
out, but is already quite useful.  MySQL's replication has a reputation
for dogdy behaviour.  So, I can't imagine the two being discussed in the
same way either.

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster