I wrote: > We recently found a couple of rows in a production database > that had identical values in the columns constituting the primary key > (The problem surfaced because a pg_dump could not be restored). > > Now I'm looking for explanations how this could happen. > > The rows originate from around the time when we had a hardware > failure that corrupted the file system. The database came up > after a file system check, and people continued working until > we noticed that some tables were corrupted. > > At that point we restored an online backup and recovered past > the time of the hardware failure. The WALs were intact and recovery > completed successfully. > > Now does the following explanation sound plausible: [...] For the record, I set up a small experiment and could reproduce the corruption like this: - initdb new cluster, start the postmaster - create a table with a primary key constraint - perform an online backup - stop the postmaster - cp -p pg_clog/0000 pg_clog/0000.bak - start the postmaster - insert a record, commit - stop the postmaster - mv pg_clog/0000.bak pg_clog/0000 - start the postmaster - Insert and commit a second record with the same primary key. This will succeed since the previous commit is "gone". - stop the postmaster - perform PITR from the online backup to the end of WALs The PITR will succeed, and you end up with two rows with the same primary key, but only the second one is in the unique index. Here's what I learned: - Don't let anybody work on a database after a hardware problem that affected the integrity of the file system, even if the postmaster comes up without complaining. - Never perform point-in-time-recovery beyond the time when the hardware problem happened. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general