On 12 May 2015 at 14:36, Wayne E. Seguin <wayneeseguin@xxxxxxxxx> wrote:
Also,Is there a way to remove these things from the init target node easier?d= p=504 a=ERROR: 55000: previous init failed, manual cleanup is requiredd= p=504 a=DETAIL: Found bdr.bdr_nodes entry for bdr (6147869128174526660,1,16908,) with state=i in remote bdr.bdr_nodesd= p=504 a=HINT: Remove all replication identifiers and slots corresponding to this node from the init target node then drop and recreate this database and try again
Now that we have SQL-level join it'd probably make sense to provide a cleanup function for failed node joins. At this point there's no such function.
Take note of the node identity given in the error as it corresponds to the replication identifier name and slot name.
You need to, on the join target node:
SELECT pg_drop_replication_slot(slot_name)
FROM pg_replication_slots
WHERE slot_name = bdr.bdr_format_slot_name('6147869128174526660',1,16908)
where the sysid, timeline ID and database OID are those given in the error. You must run this from the target node's database, as it'll only consider slots for the current database.
Then
SELECT pg_replication_identifier_drop(...)
the replication identifier used, after looking up the replication identifier from pg_catalog.pg_replication_identifier. There isn't an equivalent of bdr.bdr_format_slot_name for replication identifiers; I'll look at adding one. Look it up visually or write a simple function to format the string in the mean time.
Then delete the bdr.bdr_nodes entry for the failed-to-join node and any bdr.bdr_connections entries for it.
You *must* drop and re-create the database on the failed-to-join node, making a new blank db (preferably from template0).