Re: How to Qualifying or quantify risk of loss in asynchronous replication

otheus uibk <otheus.uibk@xxxxxxxxx> · Wed, 16 Mar 2016 10:21:41 +0100

Apologies for the double-reply... This is to point out the ambiguity between the example you gave and stated documentation.

On Wednesday, March 16, 2016, Thomas Munro <thomas.munro@xxxxxxxxxxxxxxxx> wrote:

Waiting for the transaction to be durably stored (flushed to disk) on

two servers before COMMIT returns means that you can avoid this

situation:

1.  You commit a transaction, and COMMIT returns as soon as the WAL is

flushed to disk on the primary.

2.  You communicate a fact based on that transaction to a third party

("Thank you Dr Bowman, you are booked in seat A4, your reservation

number is JUPITER123").

3.  Your primary computer is destroyed by a meteor, and its WAL sender

hadn't yet got around to sending that transaction to the standby

Section 25.2.5. "The standby connects to the primary, which streams WAL records to the standby as they're generated, without waiting for the WAL file to be filled."

This suggests that the record is on the network stack possibly before a flush to disk.

 Section 25.2.6 "If the primary server crashes then some transactions that were committed may not have been replicated to the standby server, causing data loss. The amount of data loss is proportional to the replication delay at the time of failover." 

Whence this replication delay? If the standby server is caught up and streaming asynchronously, what delays *in receiving* might there be other than network delays? 

Note: I am totally unconcerned with the possibility that both primary and standby go down at the same time. 

-- 
Otheusotheus.uibk@xxxxxxxxx
otheus.shelling@xxxxxxxxxx