Search Postgresql Archives

Re: adding another node to our pitr config

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/10/2009, at 11:51 PM, Geoffrey wrote:

We are currently using WAL shipping to have a hot spare of our databases. We want to add another node to this configuration. The question is, what is the best way to go about this?

Currently, our script checks to see if the WAL file already exists on the target server, if not, then we scp the file over. This is a local machine, so the scp overhead is not considered to be an issue.

So, the current approach is:

ssh $1 "test ! -f $2/$4" && scp $3 $1:$2/$4

So, should I simply duplicate that line for the second server and place it below this one, or should they be dependent upon each other? That is:


archive_command = 'archive.sh node1 /esc/master/pitr %p %f node2'


ssh $1 "test ! -f $2/$4" && scp $3 $1:$2/$4 &&
ssh $5 "test ! -f $2/$4" && scp $3 $5:$2/$4

The second node will not be at the same location, thus the network reliability is less.

Thanks for any insights.

I've been interested in a similar setup, too, although I haven't implemented it yet. I think there are at least 3 obvious approaches to consider. They all are basically solutions/workaround to the following issue with multiple spare nodes: If the WAL copy operation to spare node 1 succeeds but that to spare node 2 fails, you have to handle the partial success somehow. Your suggested archive_command will keep returning failure because the WAL segment already exists on node 1.

1. A shared WAL spool on a fault-tolerant SAN mounted via NFS or similar by all nodes. Then you can use a trivial `test && cp && mv' archive_command on the master node and have all the spare nodes fetch WAL files from the spool. (mv is there to make the copy atomic.) An additional advantage is that you can use the same WAL spool for periodic DB backups: You run a file-level backup once in a while and rely on the spool to accumulate the matching WALs for you. A problem with this approach is that it can be non-trivial to implement a truly fault-tolerant shared spool and you'll end up with a single point of failure on it.

2. Destructive copy. Just let your archive_command overwrite existing WAL segments. Then a failure with node 2 will result in a retry from scratch.

3. Delegation of failure handling to archive_command. Instead of relying on the pg archiver process to retry archive_command if it returned failure, run the copy op to each spare node in a loop until success. Then you'll be sure all the nodes received the WAL by the end of the script.

From a probabilistic PoV, (3) will be notably better than (2) only if the probability of failure for each node is high.



--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux