Also, what is the proper way to locate and remove these replication slots as mentioned in this log?
On Mon, May 11, 2015 at 11:36 PM, Wayne E. Seguin <wayneeseguin@xxxxxxxxx> wrote:
Also,Is there a way to remove these things from the init target node easier?d= p=504 a=ERROR: 55000: previous init failed, manual cleanup is requiredd= p=504 a=DETAIL: Found bdr.bdr_nodes entry for bdr (6147869128174526660,1,16908,) with state=i in remote bdr.bdr_nodesd= p=504 a=HINT: Remove all replication identifiers and slots corresponding to this node from the init target node then drop and recreate this database and try again--On Mon, May 11, 2015 at 11:33 PM, Wayne E. Seguin <wayneeseguin@xxxxxxxxx> wrote:To the wonderful BDR folks,I have a 5 node pg+bdr cluster that I've spun up. I have verified that each node can connect via psql to each other node. I am having an issue creating the bdr group and having the nodes join each other.The steps I am taking are as follows (I used pg compiled using getting started guide instructions):1. initdb2. start postgres3. create database pgbdr4. create extension btree_gist (actually I'm doing this on template1 for my purposes)5. create extension bdr6. on node 0 I am doing:SELECT bdr.bdr_group_create(local_node_name := 'pgbdr0',node_external_dsn := 'host=10.244.2.2 port=5432 user=postgres dbname=pgbdr');7. on nodes 1-4 I am doing (adjusted for the nodes IP):SELECT bdr.bdr_group_join(local_node_name := 'pgbdr1',node_external_dsn := 'host=10.244.2.6 port=5432 user=postgres dbname=pgbdr',join_using_dsn := 'host=10.244.2.2 port=5432 user=postgres dbname=pgbdr');Sometimes this works and all nodes are joined, but it isn't consistent and I'm not sure at the moment how to debug why.From the most recent attempt on node 0:pgbdr=# SELECT * FROM bdr.bdr_nodes;node_sysid | node_timeline | node_dboid | node_status | node_name | node_local_dsn | node_init_from_dsn---------------------+---------------+------------+-------------+-----------+------------------------------------------------------+--------------------6147857338297172162 | 1 | 16908 | r | rdpg0 | host=10.244.2.2 port=5432 user=postgres dbname=pgbdr |(1 row)From the most recent attempt on node 1:pgbdr=# SELECT * FROM bdr.bdr_nodes;node_sysid | node_timeline | node_dboid | node_status | node_name | node_local_dsn | node_init_from_dsn---------------------+---------------+------------+-------------+-----------+------------------------------------------------------+------------------------------------------------------6147857345042854082 | 1 | 16908 | i | rdpg1 | host=10.244.2.6 port=5432 user=postgres dbname=pgbdr | host=10.244.2.2 port=5432 user=postgres dbname=pgbdr(1 row)I have pasted some relevant logs from node 0 further below.Even if I use 'SELECT bdr.bdr_node_join_wait_for_ready();' it will usually just hang.How can I debug why the nodes are not connecting to each other find out where they getting stuck trying to join?d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init DEBUG: 00000: received replication command: IDENTIFY_SYSTEM [171/773]d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION: exec_replication_command, walsender.c:1291d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init DEBUG: 00000: received replication command: CREATE_REPLICATION_SLOT "bdr_16908_6147857345042854082_1_16908__"LOGICAL bdrd=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION: exec_replication_command, walsender.c:1291d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init DEBUG: 00000: searching for logical decoding starting point, starting at 0/18E87A0d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION: DecodingContextFindStartpoint, logical.c:469d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOG: 00000: logical decoding found consistent point at 0/18E87A0d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init DETAIL: There are no running transactions.d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION: SnapBuildFindSnapshot, snapbuild.c:1271d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOG: 00000: exported logical decoding snapshot: "0000071E-1" with 0 transaction IDsd=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION: SnapBuildExportSnapshot, snapbuild.c:604d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DEBUG: 00000: received replication command: IDENTIFY_SYSTEMd=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION: exec_replication_command, walsender.c:1291d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DEBUG: 00000: received replication command: CREATE_REPLICATION_SLOT "bdr_16908_6147857344392790209_1_16908__"LOGICAL bdrd=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION: exec_replication_command, walsender.c:1291d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DEBUG: 00000: searching for logical decoding starting point, starting at 0/18EC970d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION: DecodingContextFindStartpoint, logical.c:469d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOG: 00000: logical decoding found initial starting point at 0/18EC970d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DETAIL: 1 transaction needs to finish.d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION: SnapBuildFindSnapshot, snapbuild.c:1338d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DEBUG: 00000: received replication command: IDENTIFY_SYSTEMd=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION: exec_replication_command, walsender.c:1291d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DEBUG: 00000: received replication command: CREATE_REPLICATION_SLOT "bdr_16908_6147857340000055490_1_16908__"LOGICAL bdrd=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION: exec_replication_command, walsender.c:1291d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOG: 08006: unexpected EOF on client connection with an open transactiond=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION: SocketBackend, postgres.c:344d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DEBUG: 00000: searching for logical decoding starting point, starting at 0/18EC9F0d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION: DecodingContextFindStartpoint, logical.c:469d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOG: 00000: logical decoding found initial starting point at 0/18EC9F0d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DETAIL: 1 transaction needs to finish.d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION: SnapBuildFindSnapshot, snapbuild.c:1338d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOG: 00000: logical decoding found consistent point at 0/18ECA30d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DETAIL: Transaction ID 1822 finished; no more running transactions.d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION: SnapBuildEndTxn, snapbuild.c:907d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOG: 00000: logical decoding found consistent point at 0/18ECA30d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DETAIL: Transaction ID 1822 finished; no more running transactions.d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION: SnapBuildEndTxn, snapbuild.c:907d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOG: 00000: exported logical decoding snapshot: "00000720-1" with 0 transaction IDsd=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION: SnapBuildExportSnapshot, snapbuild.c:604d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOG: 00000: exported logical decoding snapshot: "0000071F-1" with 1 transaction IDd=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION: SnapBuildExportSnapshot, snapbuild.c:604d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DEBUG: 00000: received replication command: IDENTIFY_SYSTEMd=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION: exec_replication_command, walsender.c:1291d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DEBUG: 00000: received replication command: CREATE_REPLICATION_SLOT "bdr_16908_6147857340923527359_1_16908__"LOGICAL bdrd=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION: exec_replication_command, walsender.c:1291d= p=392 a=DEBUG: 00000: started 0 electionsd= p=392 a=LOCATION: bdr_sequencer_start_elections, bdr_seq.c:688d= p=392 a=DEBUG: 00000: started 0 votesd= p=392 a=LOCATION: bdr_sequencer_vote, bdr_seq.c:629d= p=392 a=DEBUG: 00000: tallied 0 electionsd= p=392 a=LOCATION: bdr_sequencer_tally, bdr_seq.c:755d= p=392 a=DEBUG: 00000: checked 0 sequences for fillingd= p=392 a=LOCATION: bdr_sequencer_fill_sequences, bdr_seq.c:1034d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOG: 08006: unexpected EOF on client connection with an open transactiond=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION: SocketBackend, postgres.c:344d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOG: 08006: unexpected EOF on client connection with an open transactiond=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION: SocketBackend, postgres.c:344d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DEBUG: 00000: searching for logical decoding starting point, starting at 0/18ECA60d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION: DecodingContextFindStartpoint, logical.c:469d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOG: 00000: logical decoding found initial starting point at 0/18ECA60d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DETAIL: 2 transactions need to finish.d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION: SnapBuildFindSnapshot, snapbuild.c:1338d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOG: 00000: logical decoding found consistent point at 0/18ECAD0d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DETAIL: Transaction ID 1823 finished; no more running transactions.d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION: SnapBuildEndTxn, snapbuild.c:907d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOG: 00000: exported logical decoding snapshot: "00000721-1" with 0 transaction IDsd=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION: SnapBuildExportSnapshot, snapbuild.c:604d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOG: 08006: unexpected EOF on client connection with an open transactiond=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION: SocketBackend, postgres.c:344