Re: BDR Selective Replication

Craig Ringer <craig@xxxxxxxxxxxxxxx> · Wed, 29 Apr 2015 14:38:55 +0800

On 29 April 2015 at 09:14, Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx> wrote:
On 4/27/15 7:54 PM, Craig Ringer wrote:

    If 'default replication set' is the idea of "here's what tables

    *should* be getting replicated regardless of whether that's

    happening or not", it'd be great if that was done so it could be

    split out on it's own at some point. It's a problem that affects all

    replication systems.

It wasn't, but that's an interesting idea.

You need  away to identify peer nodes in an abstract way before you can

really define sets of which nodes should get which tables. So I think

replication identifiers ( https://commitfest.postgresql.org/4/161/ ) are

a pre-requisite for that though, and one that's proving difficult to get

in.

Perhaps... different replication systems probably use different methods to identify, so presumably there'd need to be some way to map a generic identifier into an appropriate identifier for whatever replication system you're using.

Replication identifiers do just that: provide a way to map identifiers from some external system into a local unique identifier for a peer node, along with tracking of the replay position from the peer so replay can be restarted at a consistent point. The replay position is an LSN, so they're not going to work for any arbitrary system, though.

How would you want to go about storing and tracking the information? A

new catalog? The other issue for in-core replication sets would probably

be making it foreign-key aware, so replication of a table transitively

requires replication of its references.

As you said, we'd need a way to identify replication nodes. We might also need/want a way to specify topology.

Topology? Why?

All a node needs to know is "send data from <these tables> to <these peers>". It's just a set. If a replication system is doing something fancy it'd be able to manage the replication sets on the nodes.

 I don't think topology would be too hard (presumably it's either a single 'parent' node, or a list of peers). What might be more interesting is dealing with different systems methods of identifying nodes.

Yeah, topology is hard. Rings, mesh with dangling follower nodes, etc.

I don't think it's really the same thing as replication sets.

You'd want a way to define different sets and associate them with nodes. A node could be a provider, subscriber, or both. I think some replication systems support 'pass through' as well, where the node passes data downstream but doesn't apply it itself. Or it could be multi-master and possibly a provider to read-only subscribers.

Yeah, you're talking about some kind of abstract modelling of a replication topology. I'm not sure that's at all necessary to keep track of which tables should be replicated to which nodes.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services