On 26 March 2015 at 19:08, Peter Mogensen <apm@xxxxxxx> wrote:
--
On 2015-03-26 11:57, Craig Ringer wrote:
If that's the case then BDR shouldn't make any difference.
It does. Because now with BDR you can't compare txid_current() as saved on the master with txid_snapshot_xmin() as read by the replica.
If however, you could save the txid associated with the application of the BDR replication on the target, you would again be able to compare.
My comment was with regards to it being on the local node. A master and synchronous replica isn't a local-node to local-node scenario.
Earlier you were speaking of (presumably not synchronous) streaming replicas, and writes on the master vs reads from the replica, and a way you avoid caching stale data from the read-replica using the application's cache manager to co-ordinate visibility horizons. It sounded like you were looking for a way to translate that into something that could work with BDR. Did I misunderstand?
No. I think you understood, but the goal of using txid from the master was not to co-ordinate with the master. Only to match changes on the slave with reads on the slave.
The extra property of txid's being the same on the master was not used.
The problem when using BDR is that the "slave" has it's own txid sequence and I can't get the txid of all transactions doing changes, - since some of them are via BDR and doesn't cause triggers.
Right. So that's where I think you need to look into commit timestamps as an alternative, as outlined in prior mail.
Alternately you could look at firing a C callback in bdr_apply.c when a row is replayed.
As previously mentioned it's likely to be possible to add per-row apply callbacks that can be written in plpgsql or other function languages, but no such feature currently exists in BDR.
Firing real "FOR EACH ROW ... ENABLE REPLICA" triggers may also be possible, but I haven't done any significant looking into it, and I'd be concerned about the correctness of doing so for row triggers but not statement triggers.
One issue is that it'd probably have to be able to restrict writes to unlogged tables or tables that aren't part of a replication set. If it could write to replicated tables it'd be doing so in the context of the apply worker, so the writes would not get replicated to other nodes. That'd mean the triggers would be creating rows only on one node - and that's a fast track to data divergence that can cause replication stalls and so on.
At this point I think commit timestamps are likely to be your best bet, and certainly what you should start looking into first.