Re: Trying to handle db corruption 9.6

Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx> · Mon, 20 May 2019 18:00:11 +0300

Yes I understand that.. I'm trying to handle it after the backup that I have taken..

On Mon, May 20, 2019, 5:49 PM Flo Rance <trourance@xxxxxxxxx> wrote:
Hi,

First of all, as stated in the wiki, you'll need to do a filesystem level copy of the database files and put them on another drive before attempting to do anything else !

https://wiki.postgresql.org/wiki/Corruption

regards,
Flo

On Mon, May 20, 2019 at 4:40 PM Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx> wrote:
Hey,
I'm trying to handle a corruption that one of our customers is facing.
His disk space was full and as a result of that he decided to run pg_resetxlog a few times(bad idea..) .
When I connected to the machine I saw that the db was down. 
When I started the db (service postgresql start) I saw the next error in the logs :

DETAIL:  Could not open file "pg_multixact/offsets/0000": No such file or directory.

The pg_multixact/offset dir contained one file (0025).
The pg_multixact/members dir contains 2 files : 0000 and 0001.

I tried to follow the documentation of pg_resetxlog, and run pg_resetxlog with -m 0xF0A604,0xEA50CE which are 0025*65536  and 0026*65536  in hexa.
However, it didnt help and the same error appeared.
So I tried to rename the file to 0000 and then the db searched for a file in members that wasnt exist.
I followed the documentation and changed the multitransaction offset (-O) and the transactions id (-c ) based on the doc and then the db was started succesfully.
However after it started I saw the next msg in the logs : 
Multixact member wraparound protections are disabled because oldest checkpointed Multixact 65536 doesnt exist. In addition, no one is able to connect to the db (we keep getting errors database doesnt exist or user doesnt exist , even for postgresql user).

current relevant rows from the control data : 
pg_control version number:            960

Catalog version number:               201608131

Database system identifier:           6692952810876880414

Database cluster state:               shut down

pg_control last modified:             Mon 20 May 2019 07:07:30 AM PDT

Latest checkpoint location:           1837/E3000028

Prior checkpoint location:            1837/E2000028

Latest checkpoint's REDO location:    1837/E3000028

Latest checkpoint's REDO WAL file:    0000000100001837000000E3

Latest checkpoint's TimeLineID:       1

Latest checkpoint's PrevTimeLineID:   1

Latest checkpoint's full_page_writes: on

Latest checkpoint's NextXID:          0:3

Latest checkpoint's NextOID:          10000

Latest checkpoint's NextMultiXactId:  131072

Latest checkpoint's NextMultiOffset:  52352

Latest checkpoint's oldestXID:        3

Latest checkpoint's oldestXID's DB:   0

Latest checkpoint's oldestActiveXID:  0

Latest checkpoint's oldestMultiXid:   65536

Latest checkpoint's oldestMulti's DB: 0

Latest checkpoint's oldestCommitTsXid:4604

Latest checkpoint's newestCommitTsXid:5041

I also checked and I saw that the customer has all the wals (backed up) but without any basebackup..
Any recommendations how to handle the case ?