Re: BDR Selective Replication

Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx> · Tue, 28 Apr 2015 20:14:32 -0500

On 4/27/15 7:54 PM, Craig Ringer wrote:
    If 'default replication set' is the idea of "here's what tables
    *should* be getting replicated regardless of whether that's
    happening or not", it'd be great if that was done so it could be
    split out on it's own at some point. It's a problem that affects all
    replication systems.

It wasn't, but that's an interesting idea.

You need  away to identify peer nodes in an abstract way before you can
really define sets of which nodes should get which tables. So I think
replication identifiers ( https://commitfest.postgresql.org/4/161/ ) are
a pre-requisite for that though, and one that's proving difficult to get
in.

Perhaps... different replication systems probably use different methods 
to identify, so presumably there'd need to be some way to map a generic 
identifier into an appropriate identifier for whatever replication 
system you're using.

I think any sort of replication sets is likely to have similar problems,
especially the "no in-core user" problem. There's nothing fundamentally
impossible about filtering WAL sent to physical downstreams over
streaming replication to include only replicated tables and the
catalogs, though, so perhaps there could be an in-core user for it.

Oh, I wasn't thinking this needed to be in-core. I think it'd be a lot 
easier to develop it as an extension to start with... certainly a lot 
less headache ;) If it becomes popular then it'll be a lot easier to get 
it added.

In BDR we're currently (ab)using security labels to tag tables with
their replication sets, but I'd love to have a proper way to do that. As
I recall the prior approach, of allowing custom relation options, was
rejected on -hackers.

How would you want to go about storing and tracking the information? A
new catalog? The other issue for in-core replication sets would probably
be making it foreign-key aware, so replication of a table transitively
requires replication of its references.

As you said, we'd need a way to identify replication nodes. We might 
also need/want a way to specify topology. I don't think topology would 
be too hard (presumably it's either a single 'parent' node, or a list of 
peers). What might be more interesting is dealing with different systems 
methods of identifying nodes.

You'd want a way to define different sets and associate them with nodes. 
A node could be a provider, subscriber, or both. I think some 
replication systems support 'pass through' as well, where the node 
passes data downstream but doesn't apply it itself. Or it could be 
multi-master and possibly a provider to read-only subscribers.

Finally you'd need to associate tables and sequences with a set. I agree 
you'd want to look at FKs. I'd also like to be able to define rules for 
a set, like "include everything in this schema, unless the first 
character is _".
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general