My project's replication is failing with the following error:
2024-10-15 14:03:38.446 UTC [2840947] STATEMENT: SELECT pg_catalog.set_config('search_path', '', false);
2024-10-15 14:03:38.446 UTC [2840947] ERROR: cannot read from logical replication slot "track_subscription"
2024-10-15 14:03:38.446 UTC [2840947] DETAIL: This slot has been invalidated because it exceeded the maximum reserved size.
2024-10-15 14:03:38.446 UTC [2840947] STATEMENT: START_REPLICATION SLOT "track_subscription" LOGICAL 1380B/CBFAEFF0 (proto_version '2', publication_names '"track_ingestion"')
trackdb=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin |
catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
--------------------+----------+-----------+--------+----------+-----------+--------+------------+------+
--------------+-------------+---------------------+------------+---------------+-----------
track_subscription | pgoutput | logical | 16402 | trackdb | f | f | | |
406428081 | | 1380B/BAB7B328 | lost | | f
Publisher and Subscriber DB versions:
PostgreSQL 14.12 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22), 64-bit
2024-10-15 14:03:38.446 UTC [2840947] ERROR: cannot read from logical replication slot "track_subscription"
2024-10-15 14:03:38.446 UTC [2840947] DETAIL: This slot has been invalidated because it exceeded the maximum reserved size.
2024-10-15 14:03:38.446 UTC [2840947] STATEMENT: START_REPLICATION SLOT "track_subscription" LOGICAL 1380B/CBFAEFF0 (proto_version '2', publication_names '"track_ingestion"')
trackdb=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin |
catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
--------------------+----------+-----------+--------+----------+-----------+--------+------------+------+
--------------+-------------+---------------------+------------+---------------+-----------
track_subscription | pgoutput | logical | 16402 | trackdb | f | f | | |
406428081 | | 1380B/BAB7B328 | lost | | f
Publisher and Subscriber DB versions:
PostgreSQL 14.12 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22), 64-bit
Publisher System settings:
max_slot_wal_keep_size = -1
max_wal_size = 12GB
wal_keep_size = 0
max_slot_wal_keep_size = -1
max_wal_size = 12GB
wal_keep_size = 0
I have controls in place to prevent the replication lag from growing too much but was surprised to see the wal_status become "lost" given what I read about the default value for max_slot_keep_size.
My search of this problem suggests I should increase max_wal_size to 96GB and perhaps set max_slot_wal_keep_size = 0.
Is this correct or is there something else I should do to prevent this from ever happening again?
Thanks,
Dennis