Re: Postgresql replication

Chris Travers <chris@xxxxxxxxxxxxxxxxxx> · Fri, 26 Aug 2005 10:26:44 -0700

William Yu wrote:

Chris Travers wrote:

Why not have the people who have rights to review this all write to 
the master database and have that replicated back?  It seems like 
latency is not really an issue.  Replication here is only going to 
complicate 

What master database? Having a single master defeats the purpose of 
load balancing to handle more users.

I guess I am thinking along different lines than you.  I was thinking 
that the simplest solution would be to have master/slave replication for 
*approved* transactions only and no replication for initial commits 
prior to approval.  This makes the assumption that a single transaction 
will be committed on a single server, and that a single transaction will 
not be split over multiple servers.  In this way, you can commit a 
pending transaction to any single server, and when it is approved, it 
gets replicated via the master.  See below for more.

> things.  If it were me, I would be having my approval app pull data
> from *all* of the databases independently and not rely on the
> replication for this part.  The replication could then be used to
> replicate *approved* data back to the slaves.

If your app client happens to have high speed access to all servers, 
fine. And you can guarantee uptime connections to all servers except 
for the rare cases of hardware failure. The problem is if you don't, 
you end up with every transaction running at the speed of the slowest 
connection between a client and the farthest DB. While the final 
status of a transaction does not need to show up anytime soon on a 
user's screen, there still needs to be fast response for each 
individual user action.

Well...  It depends on how it is implimented I guess.  If you pull 
transactional information in the background while the user is doing 
other things, then it shouldn't matter.  Besides, what should actually 
happen is that your connection is only as slow as the connection to the 
server which hosts the pending transaction you are trying to commit at 
the moment.  In this way, each request only goes to one server (the one 
which has the connection).  You could probably use DBI-Link and some 
clever materialized views to maintain the metadata at each location 
without replicating the whole transaction.  You could probably even use 
DBI-Link or dblink to pull the transactions in a transparent way.  Or 
you could replicate transactions into a pending queue dynamically...  
There are all sorts of ways you could make this respond well over slow 
connections.  Remember, PostgreSQL allows you to separate storage from 
presentation of the data, and this is quite powerful.

How bad does the response get? I've done some simple tests comparing 
APP <-LAN-> DB versus APP <-cross country VPN-> DB. Even simple 
actions like inserting a recording and checking for a dupe key 
violation (e.g. almost no bandwidth needed) takes about 10,000 times 
longer than over a 100mbit LAN.

I think you could design a database such that duplicate keys are not an 
issue and only get checked on the master and then should never be a 
problem.

Thinking about it....  It seems here that one ends up with a sort of 
weird "multi-master" replication based on master/slave replication if 
you replicate these changes in the background (via another process, 
Notify, etc).

I still don't understand the purpose of replicating the pending data...

Imagine a checking account. A request to make an online payment can be 
made on any server. The moment the user submits a request, he sees it 
on his screen. This request is considered pending and not a true 
transaction yet. All requests are collected together via replication 
and the "home" server for that account then can check the account 
balance and decide whether there's enough funds to issue those payments.

Thinking about this....  The big issue is that you only want to 
replicate the deltas, not the entire account.  I am still thinking 
master/slave, but something where the deltas are replicated in the 
background or where the user, in checking his account, is actually 
querying the home server.  This second issue could be done via dblink or 
DBI-Link and would simply require that a master table linking the 
accounts with home servers be replicated (this should, I think, be 
fairly low-overhead).

Best Wishes,
Chris Travers
Metatron Technology Consulting

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match