Chris Richards wrote: [had corruption in pg_depend indexes] > I've observed the problem on other than the "pg_depend" table. A normal table (called "blocks") had > the problem. A "REINDEX TABLE blocks" did not fix the problem, however a "VACUUM FULL" did "fix" it. Did you dump/remove cluster/initdb/restore? That's the only reliable way to get rid of corruption. >> Are you running the latest minor release for your PostgreSQL version? > > No. 9.3.9 whereas the lastest is 9.3.10 I think no data corruption bugs have been discovered since 9.3.9. >> Did you make sure that you have a reliable storage system? > > RAID-6 SSDs on ext4 Did you test if it is reliable? Pulled the plug on the server during a busy checkpoint? Or read http://brad.livejournal.com/2116715.html But that shouldn't apply to your case if there was no crash. >> Were there any crashes recently? > > No[*]. See comments below about LVM snapshots. > We had thought we figured out what might have caused the problem. We were taking LVM snapshots while > the database was running and then at sometime later we reverted the snapshot (while postgres was > stopped). Our theory was that since postgres was running and therefore we captured its backing store > in an indeterminate state. We changed our snapshot process to shutdown postgres, take the LVM > snapshot, and then start postgres. If the snapshot is truly atomic, it should work - that would be just like restarting PostgreSQL after a server crash. PostgreSQL is designed to handle such a case well, provided that the storage does what it promises, i.e. make sure that data are really on disk when an fsync() is issued. Is the whole database, including pg_xlog, on one logical volume? > Unfortunately, the problem cropped up again. > > We had an empty database except for our created tables, took the snapshot, and then populated the > tables using normal INSERT/UPDATE/DELETEs while the system ran. Then, we reboot and revert the LVM > snapshot--back to the empty database. We then populated the tables by using a series "COPY <table> > FROM <file>". The only things special about this is we dropped one foreign key constraint and that > with our data files is that we may issue back-to-back COPY's to the same table (each COPY pulling in > different data as we pre-process the data before COPY'ing it). The amount of data is relatively small > from our COPY's: > > dirtable - 1 row > dirpath - 1 row > cloud - 940 rows > blocks - 176 rows (before it died) > > Indexes were not disabled during the bulk import. It "died" because of data curruption? And you had PostgreSQL shut down when you took the snapshot? Was this a new database cluster (not just a new database) created with initdb just before the test? Now that should work even if your storage lies to you. Unless there is something quite wrong with your storage - RAID checksums should prevent the effects of a failing disk from affecting you. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general