On Wed, Mar 16, 2016 at 6:26 AM, otheus uibk <otheus.uibk@xxxxxxxxx> wrote: > I've been working with PG 9.1.8 for two years now, mainly asynchronous > replication. Recently, an IT admin of another group contested that the PG's > asynchronous replication can result in loss of data in a 1-node failure. > After re-readinG the documentation, I cannot determine to what extent this > is true. It is true. If the primary server is destroyed by a meteor, it is entirely possible for recently written WAL records to be lost, because they haven't even been sent to an asynchronous standby node yet, let alone written. > Back in 2008, Robert Haas made this post > http://postgresql.nabble.com/Sync-Rep-First-Thoughts-on-Code-tp1998339p1998433.html > in which he delineates between different levels of replication. 1-safe is > guaranteed with PG WALs. Other possibilities include group-safe, both group > safe and 1-safe, 2-safe. > > How do we qualify PG when WALs are written (and archived) on the master, and > streaming replication to a hot standby, albeit asynchronous, is used? Is it > Group-safe? > > My understanding is "Strictly speaking, No". No. There is no guarantee that any other node knows about your transaction. > But what precisely is the algorithm and timing involved with streaming WALs? > > Is it: > * client issues COMMIT > * master receives commit > * master processes transaction internally > * master creates WAL record > | master appends WAL to local WAL file, flushes disk > | master sends WAL record to all streaming clients > * master sends OK to client > * master applies WAL > > So is this correct? Is it correct to say: PG async guarantees that the WAL > is *sent* to the receivers, but not that they are received, before the > client receives acknowledgement? Async replication doesn't guarantee anything at all about receivers, or even that there is one right at this moment. Did you mean to write "synchronous" instead of "asynchronous"? In asynchronous replication, the primary writes to the WAL and flushes the disk. Then, for any standbys that happen to be connected, a WAL sender process trundles along behind feeding new WAL doesn the socket as soon as it can, but it can be running arbitrarily far behind or not running at all (the network could be down or saturated, the standby could be temporarily down or up but not reading the stream fast enough, etc etc). > Can we make a case stronger than that? Assuming the T0 is round trip time > between master and client, and T1 is round trip time between master and > slave, as long as T1 <= T0, and provided both Slave and Master do not fail, > the system is Group-safe? You might use that kind of thinking to reason about the probability that a transaction has reached the standby in async mode by the time your client gets a commit response back, but it's not any kind of useful guarantee, and in any case only applies if your standby is currently connected and keeping up. That's why we have synchronous replication. If you turn on synchronous replication (using the synchronous_standby_names GUC which takes a list of standby 'application' names, or * for any) then you can stop COMMIT from returning until one standby from that list has written out the WAL record. (In future we will probably support more than one). That works using the same approach as asynchronous replication, except that there is a wait inserted into the committing transaction: it waits for the current synchronous standby to report back that it has processed the commit record. There are two levels: synchronous_commit = remote_write, meaning that the chosen standby has written the WAL but not necessarily flushed it to disk yet (I think this may be called "1-safe and group safe" in the terminology you referenced: it's flushed locally AND betting that the other machine(s) won't (all) crash), and synchronous_commit = on, meaning that the chosen standby has written it actually flushed it to disk (with fsync, fdatasync etc, "2-safe"). The former might be faster, but could lose writes that are in the OS page cache on the standby if power is lost before those pages eventually hit the disk, so "on" is probably what most people mean when they talk about synchronous replication. Asynchronous replication doesn't wait for anything except the local disk (so it is "1-safe"). Waiting for the transaction to be durably stored (flushed to disk) on two servers before COMMIT returns means that you can avoid this situation: 1. You commit a transaction, and COMMIT returns as soon as the WAL is flushed to disk on the primary. 2. You communicate a fact based on that transaction to a third party ("Thank you Dr Bowman, you are booked in seat A4, your reservation number is JUPITER123"). 3. Your primary computer is destroyed by a meteor, and its WAL sender hadn't yet got around to sending that transaction to the standby. 4. You recover using the standby. 5. The transaction has been forgotten ("I'm sorry Dave, I'm afraid we have no record of booking JUPITER123, and the rocket is full. The next rocket leaves in 7 years, would you like to book a seat?"). If you enable synchronous replication, and you are careful to recover in step 4 using the correct standby, then you can't lose a transaction that you reported to external systems *after* (because) COMMIT returned. If your primary is destroyed after you executed COMMIT, but *before* it returned, it is possible that the current synchronous standby's WAL contains the transaction or doesn't contain the transaction, but not for you to have taken any external action based on the commit having returned, because it didn't. (If your primary crashes and restarts before COMMIT returns, and it had got as far as flushing locally but not yet heard from the standby, then things may be slightly more complicated). -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general