Hi,
I'm trying to investigate a database of one of our clients. The database version is 9.2.5. The client tried to dump one of its databases and then got the next error :
pg_dump: query returned 2 rows instead of one: SELECT tableoid, oid, (SELECT rolname FROM pg_catalog.pg_roles WHERE oid = datdba) AS dba, pg_encoding_to_char(encoding) AS encoding, datcollate, datctype, datfrozenxid, (SELECT spcname FROM pg_tablespace t WHERE t.oid = dattablespace) AS tablespace, shobj_description(oid, 'pg_database') AS description FROM pg_database WHERE datname = 'db1'
So I tried to query the pg_database and I saw that there are duplicated rows in that table :
postgres=# select xmin,xmax,datname,datfrozenxid from pg_database order by datname;
xmin | xmax | datname | datfrozenxid
-------+----------+----------------+--------------
2351 | 0 | db1 | 1798
1809 | 21093518 | db1 | 1798
1806 | 0 | postgres | 1798
12594 | 0 | db2 | 1798
1803 | 0 | template0 | 1798
1802 | 0 | template1 | 1798
3590 | 0 |db4 | 1798
3592 | 0 | db3 | 1798
1811 | 21077312 | db3 | 1798
(9 rows)
-fsync and full_page_write are set to on.
I changed the databases names but as you can see db1/db3 have duplicated records. I tried to dump the postgresql database and it worked. I run vacuum on the problematic databases : connected to db1/2/3 and run the vacuum command. On many of the object I got the next detail message :
DETAIL: x dead row versions cannot be removed yet. I'm the only one working on the database and there are no additional session in pg_stat_activity. So when some of the row versions cannot be removed ?
I tried to reindex the problematic databases but got the next error :
reindexdb: reindexing of database "db1" failed: ERROR: could not access status of transaction 32212695
DETAIL: Could not open file "pg_subtrans/01EB": No such file or directory.
I checked and indeed that file doesn't exist.
I restarted the cluster and I got the same error for every database (in all cases analyze of pg_catalog.pg_shdepend" failed and caused the error) in the log file.
2018-05-06 23:46:54 +08 30185 DETAIL: Could not open file "pg_subtrans/01EB": No such file or directory.
2018-05-06 23:46:54 +08 30185 CONTEXT: automatic analyze of table "afa.pg_catalog.pg_shdepend"
2018-05-06 23:47:06 +08 30213 ERROR: could not access status of transaction 32635595
I generated a new empty subtrans file that will be called 01EB and I restarted my cluster :
dd if=/dev/zero of=/var/lib/pgsql/data/pg_subtrans/01EB bs=256k count=1
I didnt get any errors in the log of the database.
Afterwards, I still had duplicated values in pg_databases. I tried again to reindex the problematic databases :
[root@my_host pg_subtrans]# reindexdb db1 -U postgres
Password:
NOTICE: table "pg_catalog.pg_class" was reindexed
and it is just stuck from that point and didnt advanced to other tables.. In pg_stat_activity I dont see that the state_change is changing.
Any idea how can I further continue ?
Thanks.