Thanks a lot Martin for your replies.
On Sun, May 29, 2016 at 11:50 PM, Martín Marqués <martin@xxxxxxxxxxxxxxx> wrote:
Hi,
El 29/05/16 a las 06:01, Nikhil escribió:
>
> *Nik>> skip_ddl_locking is set to True in my configuration. As this
> was preventing single*
>
> *node from doing DDL operation (if one is down majority is not there
> for doing DDL on available node)*
Well, you have to be prepared to deal with burn wounds if you play with
fire. ;)
If you decide to have skip_ddl_locking on you have to be sure all DDLs
happen on one node, else you end up with conflicts like this.
I suggest you find out why the table was already created on the
downstream node (as a forensics task so you can avoid bumping into the
same issue).
> Nik>> DDL used is
>
>
> ERROR: relation "af_npx_l3_16_146_10" already exists
> <596802016-05-29 08:53:07 GMT%CONTEXT: during DDL replay of ddl
> statement: CREATE TABLE public.af_npx_license_l3_16_146_
> 10 (CONSTRAINT af_npx_license_l3_16_146_10_rpt_sample_time_check CHECK
> (((rpt_sample_time OPERATOR(pg_catalog.>=) 146417040
> 0) AND (rpt_sample_time OPERATOR(pg_catalog.<=) 1464173999))) ) INHERITS
> (public.af_npx_l3) WITH (oids=OFF)
> <554132016-05-29 08:53:07 GMT%LOG: worker process: bdr
> (6288512113617339435,2,16384,)->bdr (6288505144157102317,1, (PID 59
> 680) exited with exit code 1
On the node where the CREATE TABLE is trying to get applied run this:
BEGIN;
SET LOCAL bdr.skip_ddl_replication TO 'on';
SET LOCAL bdr.skip_ddl_locking TO 'on';
DROP TABLE af_npx_l3_16_146_10;
END;
After that, the DDL that's stuck will get applied and the stream of
changes will continue.
By the looks of what you're dealing with, I wouldn't be surprised if the
replication gets stuck again on another DDL conflict.
I suggest rethinking the locking strategy, because this shows that
there's something fishy there.
Regards,
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services