After running continuously for perhaps a year or more, my project's logical replication stopped on our test DB this morning claiming wal was lost due to size limits when there aren't any limits.
The system is running Centos7 and I was planning on moving to Rhel8 and 14.12 today, but so much for that.
Is this a bug that was fixed in a later release of 14?
Is there some other setting that must be set to get the wal retained?
Here are the details:
Version:
PostgreSQL 14.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
Log entries: (log entries that followed the last listed just continued to say the slot was invalid)
2024-08-23 03:07:45.926 UTC [1121] LOG: starting logical decoding for slot "track_subscription"
2024-08-23 03:07:45.926 UTC [1121] DETAIL: Streaming transactions committing after AB17/4A0C9F40, reading WAL from AB17/46D98068.
2024-08-23 03:07:45.926 UTC [1121] STATEMENT: START_REPLICATION SLOT "track_subscription" LOGICAL AB17/554088B0 (proto_version '2', publication_names '"track_ingestion"')
2024-08-23 03:07:45.926 UTC [1121] LOG: logical decoding found consistent point at AB17/46D98068
2024-08-23 03:07:45.926 UTC [1121] DETAIL: There are no running transactions.
2024-08-23 03:07:45.926 UTC [1121] STATEMENT: START_REPLICATION SLOT "track_subscription" LOGICAL AB17/554088B0 (proto_version '2', publication_names '"track_ingestion"')
2024-08-23 03:08:17.161 UTC [48799] LOG: terminating process 1121 to release replication slot "track_subscription"
2024-08-23 03:08:17.161 UTC [1121] FATAL: terminating connection due to administrator command
2024-08-23 03:08:17.161 UTC [1121] CONTEXT: slot "track_subscription", output plugin "pgoutput", in the change callback, associated LSN AB17/663138F0
2024-08-23 03:08:17.161 UTC [1121] STATEMENT: START_REPLICATION SLOT "track_subscription" LOGICAL AB17/554088B0 (proto_version '2', publication_names '"track_ingestion"')
2024-08-23 03:08:17.190 UTC [1121] LOG: disconnection: session time: 0:00:33.502 user=sysrep database=trackdb host=postgresqldb03.s2a.nrl.navy.mil.31.250.132.in-addr.arpa port=36840
2024-08-23 03:08:17.195 UTC [48799] LOG: invalidating slot "track_subscription" because its restart_lsn AB17/4D0E3320 exceeds max_slot_wal_keep_size
trackdb=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
--------------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------
track_subscription | pgoutput | logical | 16386 | trackdb | f | f | | | 130568429 | | AB17/554088B0 | lost | | f
(1 row)
show max_slot_wal_keep_size;
max_slot_wal_keep_size
------------------------
-1
(1 row)
Thanks,
Dennis