Re: Replication Using Triggers

gordan@xxxxxxxxxx · Fri, 18 Jan 2008 17:37:07 +0000 (GMT)

This is what I have in mind:

Have a plperl function that creates connections to all servers in the 
cluster (replication partners), and issues the supplied write query to 
them, possibly with a tag of some sort to indicated it is a replicated 
query (to prevent circular replication).

Have a post execute trigger that calls the above replication function if 
the query was issued directly (as opposed to replicated), and passes it 
the query it just executed if it was successful.

Not sure here if you mean literally the SQL query that was executed - in 
which case you have all sorts of problems with sequences and functions 
returning different values.

Indeed, but sequences at least can be worked around. Post-execute, 
sequence number used should be available already, and the sequence offset 
and increment can be set so as to ensure they do not clash. That's what 
MySQL does (and I must apologize for making the comparison all the time).

If the replication failed on any node, the whole thing gets rolled back.

This would effectively give star topology synchronous replication with 
very little effort, and no need for any external code. Am I missing 
something obvious that would prevent this from working? It would give 
replication capabilities better than MySQL's (which can only handle ring 
based multi-master replication) for the sake of about 100 lines of code. 
None of the required functionality required is new to PostgreSQL, either.

But there are plenty of solutions that do a lot better than this. Slony-I is 
the most polular. My favourite is a spin on the old db_mirror that used to be 
part of the Postgres distribution.

How would Slony be better? It doesn't seem to support master-master 
replication for one.

I can't talk about how Slony works, but db_mirror uses a very fast 'C' 
function to capture changes in a set of simple replication tables. A 
replication process then takes data from those tables and replicates (using 
actual values not the SQL statement) to any number of other servers. If one 
of the servers is down, the data remains in the replication tables until that 
node returns (or is removed).

Interesting. I was thinking about making an auxiliary feature that just 
writes a pending queue log for a server when it cannot establish the $dbh, 
and when it manages to connect, it attempts to re-play the log before 
issuing new queries.

The problem with this is that the ordering becomes inconsistent with 
multiple masters. That would be a bit too inconsistent for my liking. As 
far as I can tell, that is also why MySQL's current replication method is 
unsuitable for more than ring-replication. Having said that, ring suffers 
from similar race conditions, it's more of a hack than a solution.

Now that I think about it, I'm not actually sure that waiting for global 
success before final commit would make update/delete without race 
condition as they won't fail, but can still yield inconsistencies due to 
race conditions. Still, I think it's worth having despite this issue.

The problem with db_mirror was that the replication process was written in 
Perl. This worked fine for simple tests but was ridiculously slow for 
replicating tables holding big BYTEA structures. I re-wrote the replication 
code in 'C' and it can replicate just about arbitrarily complex transactions 
is close to real-time.

Yes, I can see how big blobs can be an issue for performance. :-(

You seem to be re-inventing the wheel, and the re-invention is not quite as 
round as the existing wheel :-)

Not quite - I think multi-master capability is important.

Is there an existing implementation of this? Perhaps a perl program that 
creates the required triggers and stored procedures from looking at a 
schema?

What you've described here would be pretty simple to implement.  However, 
I think you've greatly underestimated the performance issues involved.  If 
you need to push data to multiple databases before each transaction 
commits I think you'll find that pretty slow.

Only if transactions are used. I'm basing the requirements on "at least as 
good as MySQL", which this would meet without transactions. If transactions 
are wanted they could be enabled, otherwise it could just be fire and 
forget asynchronous replication a-la MySQL. Having a choice between 
transactions and speed is good. :-)

Synchronous replication tends to imply it works on all servers simultaneously 
or not on any. If any server fails a transaction it's rolled back on all 
servers. What you're describing sounds asynchronous to me.

Indeed, I spotted that above. The transactions roll back of they fail, but 
this alone does not quite ensure cross-node consistency of the data. Some 
kind of special DELETE/UPDATE handling would be required to fix this, but 
I don't have a definitive idea on how this could be handled. Will have to 
think about it a bit more.

One thing I haven't quite thought of a good way to do with this approach is 
the equivalent of the useful (albeit deprecated) LOAD DATA FROM MASTER 
command, that gets the server in sync by dropping and re-loading all the 
tables from the master(s) (or rather, peers in a multi-master star 
replication), and enables it in the replication. It would be neater than 
requiring downtime or global write locks. But I guess that could wait until 
version 2. :)

That's one thing. The other problem that most trigger based replication 
systems have problems with is propogating schema changes - because (I think) 
you can attach triggers to schema changes.

I presume you mean that you cannot attach triggers to schema changes. Yes, 
I had thought of that a minute ago. I don't suppose this could be deemed a 
feature request for CREATE/ALTER/DROP schema level triggers? ;)

Gordan

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster