Search Postgresql Archives

Multimaster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Our company PostgresPRO is now developing multimaster for Postgres, some analogue of MySQL Galera. It is based on logical replication and pglogical plugin from 2ndQuandrand and uses distributed transaction manager. The main differences of multimaster from standard PostgreSQL streaming replication with hot standby are:

1. There is no time gap between master and slave (which is currently present even for synchronous streaming replication). 2. It is possible to execute any transaction on any node. In case of hot standby you can only execute read-only queries on replicas and, moreover, you can not create temporary tables to store the results of complex queries. With hot standby you have to change your application by separating read-only and read-write transactions or use special proxies like PgBouncer. 3. Multimaster provides high-availability out of the box: it is able to detect node failure and provide automatic recovery and node return to the cluster. Unlike HA stacks based on streaming replication, multimaster doesn't require any external tools like corosync and pacemaker.

Unlike bidirectional replication, multimaster provides distributed consistency: all transactions will always see consistent snapshots. If transactions on different nodes are trying to update the same data, then conflicts will be detected in the same way as in standalone Postgres. No custom conflict resolution is required. Multimaster is using two phase commit, so a transaction is either applied on all nodes or on none. Thus the data of all online nodes are identical.

Multimaster doesn't provide:
1. Write scalability: the changes have to be applied to all nodes, so there can not be any write performance gain over standalone server. If you need write scalability, use sharding. In this case you have to use Postgres-XL, GreenPlum, pg_shard, or some application-level solution. 2. Optimization of complex analytic queries using distributed query execution. Multimaster doesn't build distributed execution plans and can not speed-up execution of single heavy query. 3. Georeplication. Although it is possible to have multimaster nodes in different data centers, in practice it will lead to very low write performance. For such scenarios asynchronous BDR is much better solution.


So multimaster may be efficient for providing high availability and load balancing when most of your queries are read-only.

It will be interesting for us to get user's feedbacks and collect possible use cases and workloads for multimaster. Performance of multimaster greatly depends on relative amount of update queries and transaction size (most of multimaster overhead is related to transaction processing). In case of simple pgbench-like queries performance of three-node multimaster is about two times lower than performance of standalone Postgres on update queries and about two times higher on read-only queries. In case of complex analytic queries (like TPC-H) results are much better and multimaster provides almost linear scalability. But multimaster is not intended for read-only OLAP: there are more efficient solutions. We expect most performance benefits for OLTP with high ratio of read-only queries. This is why we try to find workloads for multimaster as close to real life as possible. Any references to benchmarks, examples of queries, scenarios, etc. are welcome.

Also we would like to get feedback for the following questions:

1. Currently logical replication requires primary key for replicated table (otherwise the updates can not be propagated).
How critical can this limitation be for potential multimaster use cases?

2. Is it necessary to support local (not replicated) tables?

3. Right now multimaster replicates the whole instance. Alternatively we can try to implement database-level replication, making it possible to provide more flexible configurations of different databases connected into different clusters. It is hard to estimate now how difficult it will be to support such mode, but first of all we want to know if this feature is really needed?

4. We plan to implement sharding support. But first we also have to understand what people want from cluster first of all? HA, load balancing, distributed queries, etc. In particular: what use cases do you see for multimaster without sharding?

5.What is the preferred way of cluster configuration: through postgresql.conf or using some special API allowing to dynamically add/remove nodes from cluster?

Thanks in advance,

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux