Any idea what should I do ?
בתאריך יום ב׳, 27 באוג׳ 2018 ב-15:05 מאת Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx>:
So As Tom Lane suggested I upgraded the database to the 9.2.24 version. I vacuumed (vacuum verbose) all the databases and for one of the databases (that has a duplicated record in pg_database) I got the next meesages for two of the obejcts :2018-08-27 16:57:59 +08 db4 22026 WARNING: relation "a" page 1560 is uninitialized --- fixing2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1561 is uninitialized --- fixing2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1562 is uninitialized --- fixing2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1563 is uninitialized --- fixing2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1564 is uninitialized --- fixing2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1565 is uninitialized --- fixing2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1566 is uninitialized --- fixing2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1567 is uninitialized --- fixing2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1568 is uninitialized --- fixing2018-08-27 16:58:00 +08 db4 22026 WARNING: relation "a" page 1569 is uninitialized --- fixing2018-08-27 16:58:23 +08 db4 22026 WARNING: relation "b" page 1031 is uninitialized --- fixing2018-08-27 16:58:23 +08 db4 22026 WARNING: relation "b" page 1032 is uninitialized --- fixing2018-08-27 16:58:23 +08 db4 22026 WARNING: relation "b" page 1033 is uninitialized --- fixing2018-08-27 16:58:23 +08 db4 22026 WARNING: relation "b" page 1034 is uninitialized --- fixing2018-08-27 16:58:23 +08 db4 22026 WARNING: relation "b" page 1035 is uninitialized --- fixing2018-08-27 16:58:23 +08 db4 22026 WARNING: relation "b" page 1036 is uninitialized --- fixing2018-08-27 16:58:23 +08 db4 22026 WARNING: relation "b" page 1037 is uninitialized --- fixingI tried re indexing the problematic databases :db1>reindex system db12018-08-27 17:09:44 +08 db1 23218 ERROR: could not access status of transaction 322126952018-08-27 17:09:44 +08 db1 23218 DETAIL: Could not open file "pg_subtrans/01EB": No such file or directory.I tried to delete the duplicated rows in pg_database but the delete doesnt delete anything :select ctid,xmin,xmax,datname from pg_database order by datname;ctid | xmin | xmax | datname--------+-------+----------+----------------(0,21) | 2351 | 0 | db1(0,4) | 1809 | 21093518 | db1(0,3) | 1806 | 0 | postgres(0,24) | 12594 | 0 | db2(0,2) | 1803 | 0 | template0(0,1) | 1802 | 0 | template1(0,22) | 3590 | 0 | db3(0,23) | 3592 | 0 | db4(0,5) | 1811 | 21077312 | db4(9 rows)As you can see db1 and db4 have duplicated records. I tried to delete them :delete from pg_database where ctid='(0,4)';DELETE 0but the record does exist :select ctid,datname from pg_database where ctid='(0,4)';ctid | datname-------+---------(0,4) | db1(1 row)I set the zero_damaged_pages to on but it didnt help either.How can I continue ?בתאריך יום א׳, 26 באוג׳ 2018 ב-19:42 מאת Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx>:1.I'm not really sure if they have one but its not my business.. I'm just trying to help those guys with an application that is based on postgres...2.Yeah I realized that it is an option, but do you really think that it should be the first solution ? I tried to search for bugs that seems identical to my case but I didnt find any.3.I set the vacuum_freeze_table_age to 0 and vacuumed all the duplicated databases but it seems that it didn't solve their problem.4.I tried to delete the records according to the value in xmin/xmax and the result was that after the delete postgresql didnt recognize that I have those databases. I queried the ctid and the xmin/xman :postgres=# select xmin,xmax,datname,ctid from pg_Database;xmin | xmax | datname | ctid-------+----------+----------------+--------1802 | 0 | template1 | (0,1)1803 | 0 | template0 | (0,2)1806 | 0 | postgres | (0,3)1809 | 21093518 | db1 | (0,4)1811 | 21077312 | db2 | (0,5)2351 | 0 | db1 | (0,21)3590 | 0 | db3 | (0,22)3592 | 0 | db2 | (0,23)12594 | 0 | db4 | (0,24)I tried to delete db1 with xmax 0 and db2 with xmax 2 (the opposite of what you suggested with the ctid). Now, I thought that if the xmax is set to 0 it means that I "didnt have" any update / delete operations and therefore I should delete those databases. Does it matters which one to delete ? I read about the xmin/xmax/ctid columns but I thought that what I did was legit.Moreover I realized that I have duplicated rows in more system tables so i don`t really like this solution.My next attempt will be to upgrade to 9.2.24.Thanks Tom ! ,Regards Mariel.בתאריך יום א׳, 26 באוג׳ 2018 ב-18:51 מאת Tom Lane <tgl@xxxxxxxxxxxxx>:Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx> writes:
> I'm trying to investigate a database of one of our clients. The database
> version is 9.2.5.
1. Fire their DBA for dereliction of duty.
2. Update to the last available release of 9.2.x (9.2.24, looks like).
3. Vacuum everything and see if it gets better.
Vacuuming may or may not fix the observed data corruption problems, but
it's silly to ignore the fact that they're missing four years worth
of bug fixes in that branch. In particular I'm noticing the first
entry in the change notes for 9.2.6, which recommends "vacuuming all
tables in all databases while having vacuum_freeze_table_age set to zero":
https://www.postgresql.org/docs/9.2/static/release-9-2-6.html
That problem as-described seems to involve rows disappearing, rather than
getting duplicated, but I wouldn't rule out that it could also have
symptoms like this.
If that doesn't fix things, you could then try:
4. Manually eliminate duplicate rows by identifying which one of each pair
seems older and deleting it with DELETE ... WHERE ctid = '(x,y)'. Then
reindex to confirm no duplicates remain.
But you still need step 2, to reduce the odds the problem will recur.
regards, tom lane