Hi Vijaykumar
Thank you so much for taking out so much of your time to recreate the bug.
I checked the max_logical_replication_workers on both the nodes and they are set at 4 at the moment.
The reason why it is failing is that there are 3 existing replications on the target node already and when I create this 4th one, it is not able to create that temporary replication worker, needed to copy the initial data, and therefore it never moves ahead of that state.
Since these are production databases, I can't restart them during the weekdays, so I will try to change them during the weekend.
But I did find the warning message in the target DB logs.
"WARNING,53400,"out of logical replication worker slots",,"You might need to increase max_logical_replication_workers.",,,,,,,"","logical replication worker"
"WARNING,53400,"out of logical replication worker slots",,"You might need to increase max_logical_replication_workers.",,,,,,,"","logical replication worker"
So I am sure this will work.
THANK YOU SO MUCH.
On Fri, Feb 4, 2022 at 3:04 AM Vijaykumar Jain <vijaykumarjain.github@xxxxxxxxx> wrote:
trimming the email, to avoid noise.I spent a lot of time trying multiple options/combinations and finally managed to replicate your debug output.this is when i have ,postgres=# show max_logical_replication_workers;max_logical_replication_workers---------------------------------2(1 row)on publisherpostgres@controller:~$ psql -p 5001psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1))Type "help" for help.postgres=# create table t_bytea(id int primary key, somebytea bytea);CREATE TABLEpostgres=# create table t1(id int);CREATE TABLEpostgres=# create table t2(id int);CREATE TABLEpostgres=# create table t3(id int);CREATE TABLEpostgres=# create publication mypub1 for table t1;CREATE PUBLICATIONpostgres=# create publication mypub2 for table t2;CREATE PUBLICATIONpostgres=# create publication mypub3 for table t3;CREATE PUBLICATIONpostgres=# create publication mypub4 for table t3;CREATE PUBLICATIONpostgres=# create publication mypub5 for table t_bytea;CREATE PUBLICATIONpostgres=# insert into t_bytea select x,repeat(repeat('xxx', 100), 1000)::bytea from generate_series(1, 1000) x;INSERT 0 1000on subscriberpostgres@controller:~$ psql -p 5002psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1))Type "help" for help.postgres=# create table t_bytea(id int primary key, somebytea bytea);CREATE TABLEpostgres=# create table t1(id int);CREATE TABLEpostgres=# create table t2(id int);CREATE TABLEpostgres=# create table t3(id int);CREATE TABLEpostgres=# create table t4(id int);CREATE TABLEpostgres=# create table t5(id int);CREATE TABLEpostgres=# create subscription mysub1 connection 'port=5001' publication mypub1;NOTICE: created replication slot "mysub1" on publisherCREATE SUBSCRIPTIONpostgres=# create subscription mysub2 connection 'port=5001' publication mypub2;NOTICE: created replication slot "mysub2" on publisherCREATE SUBSCRIPTIONpostgres=# create subscription mysub3 connection 'port=5001' publication mypub3;NOTICE: created replication slot "mysub3" on publisherCREATE SUBSCRIPTIONpostgres=# create subscription mysub4 connection 'port=5001' publication mypub4;NOTICE: created replication slot "mysub4" on publisherCREATE SUBSCRIPTIONpostgres=# create subscription mysub5 connection 'port=5001' publication mypub5;NOTICE: created replication slot "mysub5" on publisherCREATE SUBSCRIPTIONpostgres=# select count(1) from t_bytea;count-------0(1 row)postgres=# table pg_subscription_rel;srsubid | srrelid | srsubstate | srsublsn---------+---------+------------+-----------16406 | 16391 | r | 0/172283816407 | 16394 | i |16408 | 16397 | i |16409 | 16397 | i |16410 | 16384 | i |(5 rows)# as expected no data in t_bytea (as it could not get any worker)postgres=# select count(1) from t_bytea;count-------0(1 row)but logs clearly state the problem2022-02-03 23:18:31.107 IST [3430] LOG: logical replication table synchronization worker for subscription "mysub1", table "t1" has started2022-02-03 23:18:31.138 IST [3430] LOG: logical replication table synchronization worker for subscription "mysub1", table "t1" has finished2022-02-03 23:18:40.730 IST [3433] LOG: logical replication apply worker for subscription "mysub2" has started2022-02-03 23:18:40.737 IST [3433] WARNING: out of logical replication worker slots2022-02-03 23:18:40.737 IST [3433] HINT: You might need to increase max_logical_replication_workers.2022-02-03 23:18:45.865 IST [3433] WARNING: out of logical replication worker slots#publisher logs2022-02-03 23:18:31.096 IST [3427] STATEMENT: CREATE_REPLICATION_SLOT "mysub1" LOGICAL pgoutput NOEXPORT_SNAPSHOT2022-02-03 23:18:31.106 IST [3429] LOG: starting logical decoding for slot "mysub1"2022-02-03 23:18:31.106 IST [3429] DETAIL: Streaming transactions committing after 0/1722800, reading WAL from 0/17227C8.2022-02-03 23:18:31.106 IST [3429] STATEMENT: START_REPLICATION SLOT "mysub1" LOGICAL 0/0 (proto_version '2', publication_names '"mypub1"')2022-02-03 23:18:31.106 IST [3429] LOG: logical decoding found consistent point at 0/17227C82022-02-03 23:18:31.106 IST [3429] DETAIL: There are no running transactions.2022-02-03 23:18:31.106 IST [3429] STATEMENT: START_REPLICATION SLOT "mysub1" LOGICAL 0/0 (proto_version '2', publication_names '"mypub1"')2022-02-03 23:18:31.129 IST [3431] LOG: logical decoding found consistent point at 0/17228002022-02-03 23:18:31.129 IST [3431] DETAIL: There are no running transactions.2022-02-03 23:18:31.129 IST [3431] STATEMENT: CREATE_REPLICATION_SLOT "pg_16406_sync_16391_7060540926182153512" LOGICAL pgoutput USE_SNAPSHOT2022-02-03 23:18:31.135 IST [3431] LOG: starting logical decoding for slot "pg_16406_sync_16391_7060540926182153512"2022-02-03 23:18:31.135 IST [3431] DETAIL: Streaming transactions committing after 0/1722838, reading WAL from 0/1722800.2022-02-03 23:18:31.135 IST [3431] STATEMENT: START_REPLICATION SLOT "pg_16406_sync_16391_7060540926182153512" LOGICAL 0/1722838 (proto_version '2', publication_names '"mypub1"')2022-02-03 23:18:35.718 IST [3432] LOG: logical decoding found consistent point at 0/17228382022-02-03 23:18:35.718 IST [3432] DETAIL: There are no running transactions.2022-02-03 23:18:35.718 IST [3432] STATEMENT: CREATE_REPLICATION_SLOT "mysub2" LOGICAL pgoutput NOEXPORT_SNAPSHOT2022-02-03 23:18:40.737 IST [3434] LOG: starting logical decoding for slot "mysub2"2022-02-03 23:18:40.737 IST [3434] DETAIL: Streaming transactions committing after 0/1722870, reading WAL from 0/1722838.2022-02-03 23:18:40.737 IST [3434] STATEMENT: START_REPLICATION SLOT "mysub2" LOGICAL 0/0 (proto_version '2', publication_names '"mypub2"')2022-02-03 23:18:40.737 IST [3434] LOG: logical decoding found consistent point at 0/17228382022-02-03 23:18:40.737 IST [3434] DETAIL: There are no running transactions.2022-02-03 23:18:40.737 IST [3434] STATEMENT: START_REPLICATION SLOT "mysub2" LOGICAL 0/0 (proto_version '2', publication_names '"mypub2"')2022-02-03 23:18:40.857 IST [3435] LOG: logical decoding found consistent point at 0/17228702022-02-03 23:18:40.857 IST [3435] DETAIL: There are no running transactions.2022-02-03 23:18:40.857 IST [3435] STATEMENT: CREATE_REPLICATION_SLOT "mysub3" LOGICAL pgoutput NOEXPORT_SNAPSHOT2022-02-03 23:18:46.091 IST [3437] LOG: logical decoding found consistent point at 0/17228A82022-02-03 23:18:46.091 IST [3437] DETAIL: There are no running transactions.2022-02-03 23:18:46.091 IST [3437] STATEMENT: CREATE_REPLICATION_SLOT "mysub4" LOGICAL pgoutput NOEXPORT_SNAPSHOT2022-02-03 23:18:51.941 IST [3439] LOG: logical decoding found consistent point at 0/17228E02022-02-03 23:18:51.941 IST [3439] DETAIL: There are no running transactions.2022-02-03 23:18:51.941 IST [3439] STATEMENT: CREATE_REPLICATION_SLOT "mysub5" LOGICAL pgoutput NOEXPORT_SNAPSHOT# solutionso i bump the max_logical_replication_workers = 10 and restart the db servers.as there are workers available, replication catches up and tables in sync.# on subscriberpsql (14.1 (Ubuntu 14.1-2.pgdg20.04+1))Type "help" for help.postgres=# select count(1) from t_bytea;count-------0(1 row)postgres=# table pg_subscription_rel;srsubid | srrelid | srsubstate | srsublsn---------+---------+------------+-----------16406 | 16391 | r | 0/172283816407 | 16394 | r | 0/200011016408 | 16397 | r | 0/200014816409 | 16397 | r | 0/200014816410 | 16384 | r | 0/20001B8(5 rows)postgres=# select count(1) from t_bytea;count-------1000(1 row)so two things,can you check logs for warning messages of out of logical replication worker slots on subscribers ?can you try bumping them, and check if it catches up (srsubstate should either be r(READY) or d (INITIAL COPY))also monitor logs for both publisher and subscriber.I also figured out all the debugging steps I requested earlier were useless.--Thanks,Vijay
This correspondence (including any attachments) is for the intended recipient(s) only. It may contain confidential or privileged information or both. No confidentiality or privilege is waived or lost by any mis-transmission. If you receive this correspondence by mistake, please contact the sender immediately, delete this correspondence (and all attachments) and destroy any hard copies. You must not use, disclose, copy, distribute or rely on any part of this correspondence (including any attachments) if you are not the intended recipient(s).