Search Postgresql Archives

Re: Bigtime scaling of Postgresql (cluster and stuff I suppose)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Bill Moran wrote:
First off, "clustering" is a word that is too vague to be useful, so
I'll stop using it.  There's multi-master replication, where every
database is read-write, then there's master-slave replication, where
only one server is read-write and the rest are read-only.  You can
add failover capabilities to master-slave replication.  Then there's
synchronous replication, where all servers are guaranteed to get
updates at the same time.  And asynchronous replication, where other
servers may take a while to get updates.  These descriptions aren't
really specific to PostgreSQL -- every database replication system
has to make design decisions about which approaches to support.

Good explanation!

Synchronous replication is only
really used when two servers are right next to each other with a
high-speed link (probably gigabit) between them.

Why is that so? There's certainly very valuable data which would gain from an inter-continental database system. For money transfers, for example, I'd rather wait half a second for a round trip around the world, to make sure the RDBS does not 'loose' my money.

PostgreSQL-R is in development, and targeted to allow multi-master,
asynchronous replication without rewriting your application.  As
far as I know, it works, but it's still beta.

Sorry, this is nitpicking, but for some reason (see current naming discussion on -advocacy :-) ), it's "Postgres-R".

Additionally, Postgres-R is considered to be a *synchronous* replication system, because once you get your commit confirmation, your transaction is guaranteed to be deliverable and *committable* on all running nodes (i.e. it's durable and consistent). Or put it another way: asynchronous systems have to deal with conflicting, but already committed transactions - Postgres-R does not.

Certainly, this is slightly less restrictive than saying that a transaction needs to be *committed* on all nodes, before confirming the commit to the client. But as long as a database session is tied to a node, this optimization does not alter any transactional semantics. And despite that limitation, which is mostly the case in reality anyway, I still consider this to be synchronous replication.

[ To get a strictly synchronous system with Postgres-R, you'd have to delay read only transactions on a node which hasn't applied all remote transactions, yet. In most cases, that's unwanted. Instead, a consistent snapshot is enough, just as if the transaction started *before* the remote ones which still need to be applied. ]

BTW: does anyone know of a link that describes these high-level concepts?
If not, I think I'll write this up formally and post it.

Hm.. somewhen before 8.3 was released, we had lots of discussions on -docs about the "high availability and replication" section of the PostgreSQL documentation. I'd have liked to add these fundamental concepts, but Bruce - rightly - wanted to keep focused on existing solutions. And unfortunately, most existing solutions are async, single-master. So explaining all these wonderful theoretic concepts only to state that there are no real solutions would have been silly.

Regards

Markus


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
      message can get through to the mailing list cleanly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux