On 2 February 2016 at 05:07, cchee-ob <carter.chee@xxxxxxxxxxxxxxxx> wrote:
I noticed that the BDR replication continually trying to replay a ddl
statement that has a syntax error. Is there anything that can be done to
skip this statement or do I need to rebuild the replicated node?
That's a DDL deparse bug. Ouch. Noted.
Honestly, rebuilding the node is the easiest option, but you can otherwise skip the transaction. It's a bit tricky and I haven't had to do it in a while or tested this procedure recently, please be cautious.
Stop the node(s) that are stuck. On the upstream node they're receiving the statement from, check pg_replication_slots to get the slot name(s).
For each slot use pg_logical_slot_peek_binary_changes('slot_name', NULL, NULL, 'interactive', 't') to find the problem xact. This is made more exciting by the fact that the protocol is binary, but you'll be able to find the SQL text in question in the stream. Determine the LSN associated with the statement. Look further down the stream until you find the commit record for it (command type 'c' in the binary protocol) and note the LSN there.
Now call pg_logical_slot_get_binary_changes('slot_name', '123/456', NULL, 'interactive', 't') to skip up to the end of the problem transaction.
When you start the BDR nodes back up they'll ask for the old replay position but it won't be available and they'll resume replay from the restart position on the slot.
This process probably seems a bit baroque. It is. One of the things we're working on right now with pglogical is a better way to skip over part of the change stream, for whatever reason desired. Starting with a decoder that lets you see the human readable protocol stream and reports the commit boundaries.
t=2016-02-01 13:02:27 PST d= h= p=21795 a=ERROR: 42601: syntax error at or
near "ON" at character 8
t=2016-02-01 13:02:27 PST d= h= p=21795 a=CONTEXT: during DDL replay of ddl
statement: GRANT ON TABLE table1 TO user2 WITH GRANT OPTION
Yup. Deparse bug.
Do you know what the original statement was?