Search Postgresql Archives

Database corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have recently upgraded from PostgreSQL 7.4.5 to 8.1.4. Our DB is about 45GB in size and has about 100 tables, lots of stored procedures, triggers etc.

The DB lived on a SAN and is accessed via Fibre Channel cards in an IBM BladeCenter. The system is running RedHat Linux Enterprise 3 I think. We connect to it using the latest JDBC driver from Java 1.5.

Here is the problem. Yesterday night I saw one of my statements in my code threw this exception:

net.za.pwnconsulting.dblayer.query.DBQueryLayerException: Could not execute SQL query [select c.mc_ca_id as c_mc_ca_id, d.mc_ca_id as d_mc_ca_id, b.ca_id, b.ca_ctr_id from contract a, contract_agreement b left outer join monthend_contract c on c.mc_ca_id = ? and c.mc_last_proc_date = ? left outer join monthend2_contract d on d.mc_ca_id = ? and d.mc_last_proc_date = ? where a.ctr_id = b.ca_ctr_id and b.ca_agreement_type = ? and b.ca_start_date > ? and b.ca_start_date <= ? and a.ctr_id = ? and ca_level_company = ? and ca_level_data = ? and ca_level_type = 'V' and ca_id <> ? limit 1] in pool [Oasis] because: ERROR: out of memory
  Detail: Failed on request of size 20.
org.postgresql.util.PSQLException: ERROR: out of memory
  Detail: Failed on request of size 20.
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse (QueryExecutorImpl.java:1525) at org.postgresql.core.v3.QueryExecutorImpl.processResults (QueryExecutorImpl.java:1309) at org.postgresql.core.v3.QueryExecutorImpl.execute (QueryExecutorImpl.java:188) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute (AbstractJdbc2Statement.java:452) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags (AbstractJdbc2Statement.java:340) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery (AbstractJdbc2Statement.java:239) at net.za.pwnconsulting.dblayer.query.StandardQueryLayer.queryDatabaseInter nal(StandardQueryLayer.java:289) at net.za.pwnconsulting.dblayer.query.StandardQueryLayer.queryDatabaseInter nalExceptionWrapper(StandardQueryLayer.java:158) at net.za.pwnconsulting.dblayer.query.StandardQueryLayer.queryDatabase (StandardQueryLayer.java:68) at net.za.pwnconsulting.dblayer.query.StandardQueryLayer.queryDatabase (StandardQueryLayer.java:58) at net.za.pwnconsulting.dblayer.AbstractDBLayer.queryDatabase (AbstractDBLayer.java:57) at za.co.massmart.oasis.daemons.ContractExtensionTimerTask.safeExecuteInter nal(ContractExtensionTimerTask.java:183) at za.co.massmart.oasis.daemons.ExtendedHACronTimerTask.safeExecute (ExtendedHACronTimerTask.java:94) at net.za.pwnconsulting.javaconfig.utils.time.HACronTimerTask.execute (HACronTimerTask.java:156) at net.za.pwnconsulting.javaconfig.utils.time.CronTimerTask.run (CronTimerTask.java:103)
	at java.lang.Thread.run(Thread.java:595)

So I checked and saw another application connecting to another DB in the same instance showed a very weird error. For an "update tablename set field1 = X where primary_field = Y" statement I got a duplicate key exception on the serial field "primary_field" - which is impossible since I am not changing its value - I am updating a different column. Vacuum full analyze did not fix this. I dropped and recreated the index, then the error went away. This was just after I restarted the postgres instance (I stopped it using pg_ctl stop -m immediate" since there were connections I could not close from other systems.

I restarted the application that encountered the first error above and ran the task again - it worked. So I then started a vacuum full analyze on the main production DB. It produced several warnings about page sizes not being large enough I think, but it completed successfully.

This morning we found some data generated this morning at 01:00 was present, but most of the tables lost all data captured since 25 August 2006. That amounts to about 500 contracts, a while billing cycle etc. We upgraded postgresql on the 20th.

I checked the file system and it yielded no errors (ext3). I checked the server and there were no obvious hardware issues.

How can part of the DB just be lost like that? I have just verified by restoring a backup made yesterday night at 20:00 - before the errors I showed above happened - that DB also has the data missing. The bill run happened yesterday afternoon so somewhere between that something broke. Any ideas on how to trouble shoot this?

To summarise - the DB lost about 6 days of transactions for no known reason.


Attachment: smime.p7s
Description: S/MIME cryptographic signature


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux