[I'm not sure whether this is the right list to discuss third party extensions like 2ndQuadrant's pglogical. If there is a better place, feel free to direct me to it] We are testing bidirectional replication with pglogical: Node 1: postgresql-9.6 9.6.6-1.pgdg90+1 postgresql-9.6-pglogical 2.1.0-1.jessie+1 Node 2: postgresql-10 10.1-1.pgdg90+1 postgresql-10-pglogical 2.1.0-1.stretch+1 (They are on different versions because we wanted to test a staggered upgrade) We have configured bidirectional replication on several tables. In general that seems to work fine. When a row is inserted/updated/ deleted on one node, it is replicated to the other. But now I seem to have triggered a replication loop: wds=> select id, source_id, insert_ts, start_ts, update_ts, finished_ts, module_id, status_id, status_msg, pid, username from qualitygate.jobs where id = 19142008; id | source_id | insert_ts | start_ts | update_ts | finished_ts | module_id | status_id | status_msg | pid | username ----------+-----------+----------------------------+----------+-----------+-------------+-----------+-----------+------------+-----+---------- 19142008 | 9 | 2017-12-07 11:39:09.904071 | | | | 80 | 0 | | 0 | hjp (1 row) wds=> select id, source_id, insert_ts, start_ts, update_ts, finished_ts, module_id, status_id, status_msg, pid, username from qualitygate.jobs where id = 19142008; id | source_id | insert_ts | start_ts | update_ts | finished_ts | module_id | status_id | status_msg | pid | username ----------+-----------+----------------------------+----------------------------+----------------------------+-------------+-----------+-----------+------------+------+---------- 19142008 | 9 | 2017-12-07 11:39:09.904071 | 2017-12-07 11:39:09.914616 | 2017-12-07 11:39:09.914616 | | 80 | 0 | | 6863 | hjp (1 row) wds=> select id, source_id, insert_ts, start_ts, update_ts, finished_ts, module_id, status_id, status_msg, pid, username from qualitygate.jobs where id = 19142008; id | source_id | insert_ts | start_ts | update_ts | finished_ts | module_id | status_id | status_msg | pid | username ----------+-----------+----------------------------+----------------------------+---------------------------+---------------------------+-----------+-----------+------------+-----+---------- 19142008 | 9 | 2017-12-07 11:39:09.904071 | 2017-12-07 11:39:09.914616 | 2017-12-07 11:39:09.95925 | 2017-12-07 11:39:09.95925 | 80 | 0 | | -1 | hjp (1 row) Subsequent selects randomly return one of three states for this row (other rows are stable). The order above is the "logical order", i.e. the row was first inserted with the first state, then updated to the second, and finally to the third. I suspect that the updates are bounced between the two nodes eternally overwriting each other and never reaching a stable state. Has anybody seen this? If so, is there a way to reliably avoid this? (Maybe with a different pglogical.conflict_resolution setting, but if it was a conflict, I should see something in the logs, right?) hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@xxxxxx | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
Attachment:
signature.asc
Description: PGP signature