Search Postgresql Archives

Database Corruption - last chance recovery options?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Had some database corruption problems today. Since they came on the heels of making some minor database changes yesterday, they may or may not be related to that. Centos 4.x, Postgresql 8.1.4

I modified the following settings and then issued a reload. I hadn't turned up the kernel.shmmax to allow for these bigger memory settings, but the database continued to run fine.

shared_buffers = 10000
work_mem = 2048
autovacuum = on                        # enable autovacuum subprocess?
autovacuum_naptime = 60 # time between autovacuum runs, in secs
autovacuum_vacuum_threshold = 1000     # min # of tuple updates before
autovacuum_analyze_threshold = 500     # min # of tuple updates before
autovacuum_vacuum_scale_factor = 0.4   # fraction of rel size before
autovacuum_analyze_scale_factor = 0.2  # fraction of rel size before
autovacuum_vacuum_cost_delay = -1      # default vacuum cost delay for
autovacuum_vacuum_cost_limit = -1      # default vacuum cost limit for

After reloading I ran a number of vacuumdb -a -z which completed successfully.

Sometime after the vacuum or reload one of our clients started to have database problems. The other databases in the same postgresql on the server don't seem to be affected.

When I finally got the error report in the morning the database was in this state:

$ psql dbname

dbname=# \dt
ERROR:  cache lookup failed for relation 20884

Doing a select * from pg_tables seemed to indicate that some of the tables were no longer in the database, also some other tables were inaccessible.

I made a backup and then some functionality was restored by issuing a reindex system dbname

Using the "broken database" pg_dump on all tables in pg_table gives this for some tables:

pg_dump: SQL command failed
pg_dump: Error message from server: ERROR: cache lookup failed for relation 20871 pg_dump: The command was: SELECT tableoid, oid, conname, pg_catalog.pg_get_const raintdef(oid) as condef FROM pg_catalog.pg_constraint WHERE conrelid = '20876'::
pg_catalog.oid AND contype = 'f'
pg_dump: SQL command failed
pg_dump: Error message from server: ERROR: relation "public.auth_groups_permissions" does not exist pg_dump: The command was: LOCK TABLE public.auth_groups_permissions IN ACCESS SH
ARE MODE
pg_dump: SQL command failed
pg_dump: Error message from server: ERROR: relation with OID 21186 does not exist pg_dump: The command was: LOCK TABLE public.ght_ght_shippingorders IN ACCESS SHA
RE MODE

Our backups failed of course, and we have made a good attempt at recovery, which we are willing to accept as all the recovery we can do.

As a last chance to get some of the data back, I would be interested if there is any way to read through the raw database files to see if I can recover some more data from them.

I'm also curious if any of the settings/reload caused this problem, or perhaps the vacuum or autovacuum is what caused our error. I'll file a bug report if it's somehow repeatable.

Oh, and make backups.

-Mike


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux