Log-based repliaction?

Tobias Brox <tobias@xxxxxxxxxxxxx> · Sat, 21 Oct 2006 06:22:44 +0200

I would eventually like to: 

 1) find a way to (stress) test a mockup database server.

 2) fix a near-realtime replication of our production database server
    for pulling out statistics and reports.  Minor bugs would be
    permitable in such a setup.  We don't need a full-fledged data
    warehouse solution just yet.

 3) set up a failover solution *) 

Now, why not just get the production database to log all the queries,
and feed them into a replication database?  I guess this solution
could be used for archieving any of the three things above.  This idea
is very trivial, but I never found any pointers while googling, so I
assume there are some problems with this approach?

Here is some things I can think of at the moment:

 - logging all queries at the production database will slow it down
   considerably (haven't done much testing here)

 - transactional model can easily be broken (postgres can log
   transaction starts, commits and rollbacks, and the transactions are
   also supposed to be serializable ... so I don't see the issue?)

 - disregarded due to the resource consumption on the replica server.

 - some of the transactions aren't really serializable, or relies on
   the exact timestamp for the operation. **)

 - unlike the wal, the log file doesn't get synced for every
   transaction, and this will cause delays and potentially data loss.

 ...anything else?

The simplest thing would be to have one connection open towards the
replica for every connection made towards the production database, run
every query in order, and hope the best - should work good for problem
1 and problem 2 above.

Still, maybe better (for 2) to filter out only queries altering the
table and transactions ending with a commit - and do the transactions
one by one, ordered by commit time.  Would save quite some CPU cycles
on the replica database compared to the suggestion above.

I was administring a mysql database quite some years ago ... well, it
was a simple hobby project and we didn't even need transactional
operations.  However, we did need point-in-time-recovery.  The mysql
way of fixinge this was to write a separate log containing only the
queries involving writing to the database.  This log was really nice
to work with, and it could easily be used for replication as well.

*) boss has decided that a failover solution is important and should
be prioritied in 2007 ... even to the point where he's willing to of
throw money at it.  If anyone have experiences with failover-solutions
built over a SAN, and can do consultancy services for us, please send
me a private email ;-)

**) We have had problems with transactions doing "if not exists in
database then insert".  Two such transactions can be run synchronously
and result in duplicates.