Re: locate DB corruption

Dave Peticolas <dave@xxxxxxxxxx> · Sun, 2 Sep 2018 08:09:53 -0700

On Sun, Sep 2, 2018 at 4:51 AM Stephen Frost <sfrost@xxxxxxxxxxx> wrote:
Greetings,

* Dave Peticolas (dave@xxxxxxxxxx) wrote:

> On Sat, Sep 1, 2018 at 5:09 PM Adrian Klaver <adrian.klaver@xxxxxxxxxxx>

> wrote:

> 

> > On 09/01/2018 04:45 PM, Dave Peticolas wrote:

> >

> > > Well restoring from a backup of the primary does seem to have fixed the

> > > issue with the corrupt table.

> >

> > Pretty sure it was not that the table was corrupt but that transaction

> > information was missing from pg_clog.

> >

> > In a previous post you mentioned you ran tar to do the snapshot of

> > $PG_DATA.

> >

> > Was there any error when tar ran the backup that caused you problems?

> 

> Well the interesting thing about that is that although the bad table was

> originally discovered in a DB restored from a snapshot, I subsequently

> discovered it in the real-time clone of the primary from which the backups

> are made. So somehow the clone's table became corrupted. The same table was

> not corrupt on the primary, but I have discovered an error on the primary

> -- it's in the thread I posted today. These events seem correlated in time,

> I'll have to mine the logs some more.

Has this primary been the primary since inception, or was it promoted to

be one at some point after first being built as a replica..?

It was the primary since inception. All the problems now appear to have stemmed from the primary due to a bug in 9.6.8 (see other thread). I've since upgraded to 9.6.10.