Hello Postgres community,
I have a database cluster that crashed in a way I don’t understand.
Some details about the setup:
The way the SQL in this transaction is written:
I thought that maybe my issue is that I shouldn’t have any active subscriptions that are replicating tables that will be dropped and recreated within a transaction. So, I tried something different: with the same active subscriber, I first removed the tables from the replication set and then executed the transaction that replaces the inheritance tables with the declarative versions. Even with the tables already removed from the replication set, the $PGDATA/base directory keeps growing until the disk is full.
This increase in the disk space for postgres data directory will stop when I drop the pglogical replication sets and replication slots (in other words just dropping the provider node).I have confirmed it’s not the pg_wal that’s holding up space.
We did a similar migration last year (using identical SQL with different table names) when our PG was on 10.12 and didn’t see this issue. Hence, I am confused on why having an active replication slot but no target table part of it will cause the postgres to fill up the base disk space. I have also compared the schema and table sizes of these target tables from before and after my sql run and there don’t seem to be any difference.
Postgres Version on Master database: 14.1
Postgres version on Subscriber database cluster: 14.1
Pglogical version on master: pglogical 2.4.0-1.rhel7
pglogical version on subscriber: pglogical 2.4.0-1.rhel7
I also tested this after upgrading pglogical to 2.4.1 and still found it to have the same issue.
Thanks & Regards,
I have a database cluster that crashed in a way I don’t understand.
Some details about the setup:
- The database that crashed is running postgres 14.1
- This database has three physical standbys using repmgr
- The database allows another database in the same location to subscribe to logical replication subscriptions
- The $PGDATA directory has 1.8TB total storage and was at 50% usage before this issue occurred
- The pg_wal directory is symlinked to another partition that has 1.4Tb of available space
The way the SQL in this transaction is written:
- BEGIN
- In a DO block, create unlogged temporary tables as select * from (inheritance partitioned tables)
- In a DO block, drop the old inheritance partitioned tables
- Multiple CREATE TABLE statements to recreate these tables using declarative partitioning
- DO blocks to create the individual partitions and indexes
- DO block to set proper permissions
- DO block that inserts the data from the unlogged temporary tables into the base tables and drops the temporary tables
- END
I thought that maybe my issue is that I shouldn’t have any active subscriptions that are replicating tables that will be dropped and recreated within a transaction. So, I tried something different: with the same active subscriber, I first removed the tables from the replication set and then executed the transaction that replaces the inheritance tables with the declarative versions. Even with the tables already removed from the replication set, the $PGDATA/base directory keeps growing until the disk is full.
This increase in the disk space for postgres data directory will stop when I drop the pglogical replication sets and replication slots (in other words just dropping the provider node).I have confirmed it’s not the pg_wal that’s holding up space.
We did a similar migration last year (using identical SQL with different table names) when our PG was on 10.12 and didn’t see this issue. Hence, I am confused on why having an active replication slot but no target table part of it will cause the postgres to fill up the base disk space. I have also compared the schema and table sizes of these target tables from before and after my sql run and there don’t seem to be any difference.
Postgres Version on Master database: 14.1
Postgres version on Subscriber database cluster: 14.1
Pglogical version on master: pglogical 2.4.0-1.rhel7
pglogical version on subscriber: pglogical 2.4.0-1.rhel7
I also tested this after upgrading pglogical to 2.4.1 and still found it to have the same issue.
Siddhartha Gurijala