Fixing or Mitigating this ERROR: invalid page in block 35217 of relation base/16421/3192429

Abdul Qoyyuum <aqoyyuum@xxxxxxxxxxxxxxxxx> · Wed, 29 Nov 2023 12:12:00 +0800

Hi all,
Knowing that it's a data corruption issue, the only way to fix this is to vacuum and reindex the database. What was suggested was the following:

SET zero_damaged_pages = 0; # This is so that we can have the application to continue to run
VACUUM FULL VERBOSE ANALYSE; # Do a full vacuum and analyse the problem if possible.
REINDEX DATABASE "core"; # Then do a reindex and clean it up.

We're on Postgresql 12. This has worked before it happened (almost exactly a year ago) and I think this needs a more permanent solution. I've looked at routine vacuuming and checked the autovacuum is set to on and the following configurations:

core=> select name, setting from pg_settings where name like 'autovacuum%';
                name                 |  setting
-------------------------------------+-----------
 autovacuum                          | on
 autovacuum_analyze_scale_factor     | 0.1
 autovacuum_analyze_threshold        | 50
 autovacuum_freeze_max_age           | 200000000
 autovacuum_max_workers              | 3
 autovacuum_multixact_freeze_max_age | 400000000
 autovacuum_naptime                  | 60
 autovacuum_vacuum_cost_delay        | 2
 autovacuum_vacuum_cost_limit        | -1
 autovacuum_vacuum_scale_factor      | 0.2
 autovacuum_vacuum_threshold         | 50
 autovacuum_work_mem                 | -1
(12 rows)

Can anyone advise if there's anything else we can do? We have no clue what causes the invalid page block and we are running a High Availability cluster set up but we are hoping that there may be a way to mitigate it.