Hey Everyone, Need some help with logical replication. I’m trying to move my mail 2.5TB database to a new postgres server and upgrade from 14-15 at the same time. I’m running into some strange issues. 1. my pg_replication_slots are filling up with entries that have no active pid. I can’t figure out what is going on here. ``` 15> select * from pg_replication_slots pg_67341299_sync_92244_7249511065961178394 │ pgoutput │ logical │ 16387 │ xxx │ f │ f │ (null) │ (null) │ 698512637 │ 1270/84298C10 │ 1270/843169A8 │ reserved │ (null) │ f pg_67341299_sync_89883_7249511065961178394 │ pgoutput │ logical │ 16387 │ xxx │ f │ f │ (null) │ (null) │ 698171095 │ 126E/20D13A50 │ 126E/20D13A88 │ extended │ (null) │ f pg_67341299_sync_90126_7249511065961178394 │ pgoutput │ logical │ 16387 │ xxx │ f │ f │ (null) │ (null) │ 698167442 │ 126E/1F66FFC8 │ 126E/1F670000 │ extended │ (null) │ f pg_67341299_sync_92637_7249511065961178394 │ pgoutput │ logical │ 16387 │ xxx │ f │ f │ (null) │ (null) │ 698153629 │ 126E/1A732E08 │ 126E/1A732EE0 │ extended │ (null) │ f pg_67341299_sync_89526_7249511065961178394 │ pgoutput │ logical │ 16387 │ xxx │ f │ f │ (null) │ (null) │ 698041660 │ 126D/E5A0E4C0 │ 126D/E5A10CC8 │ extended │ (null) │ f pg_67341299_sync_89604_7249511065961178394 │ pgoutput │ logical │ 16387 │ xxx │ f │ f │ (null) │ (null) │ 698177047 │ 126E/231CBD58 │ 126E/231CBE30 │ extended │ (null) │ f pg_67341299_sync_92447_7249511065961178394 │ pgoutput │ logical │ 16387 │ xxx │ f │ f │ (null) │ (null) │ 698233353 │ 126E/53D730B0 │ 126E/53D730E8 │ extended │ (null) │ f pg_67341299_sync_93707_7249511065961178394 │ pgoutput │ logical │ 16387 │ xxx │ f │ f │ (null) │ (null) │ 698171095 │ 126E/20D1C608 │ 126E/20D1ECB0 │ extended │ (null) │ f ``` 2. I’m getting way more tables marked as copying than should be allowed. Says I have 619 tables copying. However, I’m limited to 10 replication slots on the master and have max_sync_workers_per_subscription set to 8. That should mean at any given time I have 8 tables copying during the initial data load. (Other 2 slots are for the main logical replication slot and our patroni replication). I don’t understand what is happening. I would love to bump my replication slots on the current primary, but that requires an outage. ``` 15> BEGIN; select current_timestamp, case srsubstate when 'i' then 'Initialized' when 'd' then 'Data Copying' when 'f' then 'Finished' when 's' then 'Synchronized' when 'r' then 'Ready' end as table_state, count(*) from pg_subscription_rel group by srsubstate order by srsubstate; select pg_size_pretty(pg_database_size(current_database())); rollback; BEGIN current_timestamp | table_state | count -------------------------------+--------------+------- 2023-07-16 20:23:13.780121-04 | Data Copying | 619 2023-07-16 20:23:13.780121-04 | Initialized | 721 2023-07-16 20:23:13.780121-04 | Ready | 152 2023-07-16 20:23:13.780121-04 | Synchronized | 19 (4 rows) ``` Any advice would be welcome. I need to get this database replicated and ready to failover as soon as possible. Initially we are scheduled for this Thursday (7/20) for the failover, but of course replication has to be current before then. Thanks, Chris Hoover Senior DBA AWeber.com Cell: (803) 528-2269 Email: chrish@xxxxxxxxxx |