On Fri, Feb 7, 2020 at 1:47 AM Nick Renders <postgres@xxxxxxxxxx> wrote: > Thank you for the feedback, Alvaro. > > Unfortunately, the database is no longer "dumpable". We were able to do > a pg_dump yesterday morning (12 hours after the crash + purging the > pg_clog) but if we try one now, we get the following error: > > unexpected chunk number 1 (expected 0) for toast value 8282331 in > pg_toast_38651 > > Looking at our data, there seem to be 6 tables that have corrupt > records. Doing a SELECT * for one of those records, will return a > similar error: > > missing chunk number 0 for toast value 8288522 in pg_toast_5572299 > > > What is the best way to go from here? Is tracking down these corrupt > records and deleting them the best / only solution? > Is there a way to determine of there are issues with new data (after the > crash)? > > Any help and advice is very much appreciated. This error indicates that the file did exist already, it was just shorter than we expected: 2020-02-04 15:20:44 CET DETAIL: Could not read from file "pg_clog/00EC" at offset 106496: Undefined error: 0. What was the length of the file before you overwrote it? Are there 00EB and 00ED files, and if so what size? When your server rebooted, did crash recovery run or had it shut down cleanly? Do you know if the machine lost power, or the kernel crashed, or if it was a normal reboot? What are your settings for "fsync" and "wal_sync_method"? What is the output of pg_controldata -D pgdata? I wonder if that part of the clog file was supposed to be created before the checkpoint (ie the checkpoint is somehow borked), or was supposed to be created during recovery after that checkpoint (something else is borked, but I don't know what), or if the xid is somehow corrupted. Here's a dirty trick that might help rescue some data. Assuming you have a copy of the original file before you zeroed it, you could write a 256kb file full of 0x55 (that's 01010101 and represents 4 commits, so if you fill the file up with that it means 'all transactions in this range committed', which is probably closer to the truth than all zeroes), and then copy the original shorter file over the top of it, so that at least the range of transactions represented by the earlier part of the file that did make it to disk are preserved, and we have just have bogus force-everything-to-look-committed data after that. But as Alvaro said, this is a pretty bad situation, this is key meta data used to interpret all other data files, so all bets are off here, this is restore-from-backups territory.