On 6/8/18 8:30 PM, Jeremy Finzel wrote:
No I was referring to this from the documentation:
Avoid spuriously marking pages as all-visible (Dan Wood, Pavan
Deolasee, Álvaro Herrera)
This could happen if some tuples were locked (but not deleted).
While queries would still function correctly, vacuum would normally
ignore such pages, with the long-term effect that the tuples were
never frozen. In recent releases this would eventually result in
errors such as "found multixact nnnnn from before relminmxid nnnnn".
Hello hackers,
We got the same issue on a 9.6.10, autovacuum reports the same error
when he tried to freeze a table:
ERROR: found multixact 37009963 from before relminmxid 48606990
autovacuum was not able to freeze this table until postgres reach 1
million transactions before wraparound and refuse any new transaction.
We have an OLTP workload and I noticed queries like SELECT .. FOR SHARE.
I checked durability settings (fsync, fpw) everything seems fine. Also,
I did not notice any error on the storage (local ssd with raid controler).
We "solved" with a dump/restore on another server, also we kept previous
cluster to investigate. I want to be sure we encountered the bug solved
in 9.6.9 and it is not a new one.
If we confirm it, maybe we should advise users to perform integrity
check? I was surprised 9.6.9 avoid new appearance of corruption but
nothing to ensure if database is already corrupted.
FYI, we tried to do a pgdump on secondary:
pg_dump: Error message from server: ERROR: could not access status of
transaction 1323321209
DETAIL: Could not open file "pg_clog/04EE": No such file or directory.
I am surprised this clog file is missing on the secondary (but present
on primary) :
[...]
04EB
04EC
04ED
0CEE
0CEF
0CF0
0CF1
[...]
I ran pg_visibility's checks, they do not reveal any corruption :
select pg_check_visible(i) from (select nspname ||'.' || relname as i
from pg_class join pg_namespace on
pg_class.relnamespace=pg_namespace.oid WHERE relkind IN ('r','t','m')) as i;
(0 rows)
select pg_check_frozen(i) from (select nspname ||'.' || relname as i
from pg_class join pg_namespace on
pg_class.relnamespace=pg_namespace.oid WHERE relkind IN ('r','t','m')) as i;
(0 rows)
Here is pg_controldata and information on the table where autovacuum
reports error :
/usr/pgsql-9.6/bin/pg_controldata
pg_control version number: 960
Catalog version number: 201608131
Database system identifier: 6451990604639161176
Database cluster state: in production
pg_control last modified: Mon 05 Nov 2018 12:44:30 PM CET
Latest checkpoint location: B9AF/70A4FD98
Prior checkpoint location: B9AF/70A4B7D0
Latest checkpoint's REDO location: B9AF/70A4FD60
Latest checkpoint's REDO WAL file: 000000010000B9AF00000070
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 1:1323325854
Latest checkpoint's NextOID: 1665327960
Latest checkpoint's NextMultiXactId: 60149587
Latest checkpoint's NextMultiOffset: 174269996
Latest checkpoint's oldestXID: 3471809503
Latest checkpoint's oldestXID's DB: 16393
Latest checkpoint's oldestActiveXID: 1323325854
Latest checkpoint's oldestMultiXid: 48606990
Latest checkpoint's oldestMulti's DB: 16393
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Mon 05 Nov 2018 12:44:29 PM CET
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: off
max_connections setting: 750
max_worker_processes setting: 8
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
SELECT * FROM pg_class WHERE oid = 'anonymised'::regclass;
-[ RECORD 1
]-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
relname | anonymised
relnamespace | 2200
reltype | 16719
reloftype | 0
relowner | 16386
relam | 0
relfilenode | 438244741
reltablespace | 0
relpages | 1823173
reltuples | 6.82778e+07
relallvisible | 1822975
reltoastrelid | 438244744
relhasindex | t
relisshared | f
relpersistence | p
relkind | r
relnatts | 25
relchecks | 0
relhasoids | f
relhaspkey | t
relhasrules | f
relhastriggers | t
relhassubclass | f
relrowsecurity | f
relforcerowsecurity | f
relispopulated | t
relreplident | d
relfrozenxid | 3471809503
relminmxid | 48606990
relacl | {anonymised}
reloptions |
{autovacuum_vacuum_scale_factor=0.1,autovacuum_analyze_scale_factor=0.05}
Do you have any idea to investigate? Or steps to identify previous
corruption on another instance?
Thanks!