Hi,
we have an error happening on a catalog table on one of the dbs in the instance.
This is the error report form:
A description of what you are trying to achieve and what results you expect.:
Any kind of vacuum fails on pg_authid table, I would expect it to succeed. This is occasionaly blocking autovacuums, so we must resort to manual vacuum of busy tables, which is bearable but inconvenient.
PostgreSQL version number you are running:
PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit
How you installed PostgreSQL:
From the pgdg yum repositories.
Changes made to the settings in the postgresql.conf file: see Server Configuration for a quick way to list them all.
name | current_setting | source
------------------------------+------------------------------------------------------------------------------+----------------------
application_name | psql | client
archive_command | rsync -a %p barman@x.x.x.x:/data/backup/barman/ubipgsql-10/incoming/%f | configuration file
archive_mode | on | configuration file
checkpoint_completion_target | 0.9 | configuration file
checkpoint_timeout | 25min | configuration file
client_encoding | UTF8 | client
DateStyle | ISO, MDY | configuration file
default_text_search_config | pg_catalog.english | configuration file
dynamic_shared_memory_type | posix | configuration file
effective_cache_size | 64GB | configuration file
hot_standby | on | configuration file
hot_standby_feedback | on | configuration file
lc_messages | en_US.UTF-8 | configuration file
lc_monetary | en_US.UTF-8 | configuration file
lc_numeric | en_US.UTF-8 | configuration file
lc_time | en_US.UTF-8 | configuration file
listen_addresses | * | configuration file
log_autovacuum_min_duration | 1s | configuration file
log_checkpoints | on | configuration file
log_destination | syslog | configuration file
log_directory | log | configuration file
log_filename | postgresql-%a.log | configuration file
log_line_prefix | user=%u,db=%d,client=%h | configuration file
log_lock_waits | on | configuration file
log_min_duration_statement | 1s | configuration file
log_min_messages | info | configuration file
log_rotation_age | 1d | configuration file
log_rotation_size | 0 | configuration file
log_statement | ddl | configuration file
log_temp_files | 1MB | configuration file
log_timezone | Europe/Rome | configuration file
log_truncate_on_rotation | on | configuration file
logging_collector | on | configuration file
maintenance_work_mem | 1GB | configuration file
max_connections | 1000 | configuration file
max_stack_depth | 2MB | environment variable
max_wal_size | 12GB | configuration file
min_wal_size | 80MB | configuration file
password_encryption | scram-sha-256 | configuration file
pg_stat_statements.max | 10000 | configuration file
pg_stat_statements.track | all | configuration file
shared_buffers | 32GB | configuration file
shared_preload_libraries | pg_stat_statements | configuration file
syslog_facility | local0 | configuration file
syslog_ident | postgres | configuration file
TimeZone | Europe/Rome | configuration file
track_activity_query_size | 2048 | configuration file
track_functions | all | configuration file
track_io_timing | on | configuration file
wal_keep_segments | 200 | configuration file
wal_level | logical | configuration file
(51 rows)
Operating system and version:
CentOS Linux release 7.4.1708 (Core)
Linux xx.xx.com 3.10.0-693.11.1.el7.x86_64 #1 SMP Mon Dec 4 23:52:40 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
What program you're using to connect to PostgreSQL:
psql, jdbc driver. It's not an application side error.
Is there anything relevant or unusual in the PostgreSQL server logs?:
Yes, we see these kind of errors:
May 20 16:06:04 ubipgsql01 postgres[26739]: [1380-1] user=,db=,client= ERROR: found xmin 2889675859 from before relfrozenxid 400011439
May 20 16:06:04 ubipgsql01 postgres[26739]: [1380-2] user=,db=,client= CONTEXT: automatic vacuum of table "postgres.pg_catalog.pg_authid"
When these errors pop up, autovacuum repeatedly fails on this very same table, and we have to resort to manual vacuums
For questions about any kind of error:
What you were doing when the error happened / how to cause the error:
The first time we saw the error, we found an idle transaction was left open for several days. The user was not monitored for long running transactions. We killed the session, deleted the offending row (the one with the xmin value reported) and then vacuumed the table.
However, teh same error pops up every time we try to add a new user. We would like to fix these error once and for all, so we can add new users.
The EXACT TEXT of the error message you're getting, if there is one: (Copy and paste the message to the email, do not send a screenshot)
This is the error:
psql (10.4)
Type "help" for help.
postgres=# vacuum pg_authid;
ERROR: found xmin 3031994631 from before relfrozenxid 400011439
Some datas:
postgres=# select xmin from pg_authid ;
xmin
------------
1
1
1
1
1
557
7216348
110077819
110511334
3031994631
3032044199
3032044199
3032044199
3032070282
(14 rows)
postgres=# select relfrozenxid from pg_class where relname='pg_authid';
relfrozenxid
--------------
400011439
(1 row)
postgres=#
Is this a sympthom of data corruption or transaction wraparound due to the long running transaction that we killed weeks ago? This is the only table in the whole cluster that has this error. We are monitoring transactions wraparound with the the check_postgres.pl script, the check is still running fine and no alert was given at all since the cluster has been running.
Thank you in advance for any answer.
Paolo Crosato