Search Postgresql Archives

database corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We are running a 64-bit PostgreSQL 9.4.5 server on Windows Server 
2012. The system is a virtual machine on a VMware ESX 6.0 server and 
has 24 GB of memory. The database server is only accessed locally by 
two services and there is only a single database in the server. The 
disk is located on a storage that is shared with lots of other 
servers. The database server has fsync enabled.

A few weeks ago we already had a data corruption when the disk was 
full. There are other services running on the same machine that could 
cause the disk to fill up (e.g. local chaching when the network is 
acting up). It happened a few times so far but the database was never 
compromised. In that case thought it was but fortunately we only lost 
a huge table/toast (300+ GB) that has very verbose data stored which 
is not essential. That happened with an earlier 9.4 version.

Today we encountered another data corruption after the disk was full. 
It's much worse this time around since data that is essential for the 
applications using it to run. After truncating that 300+ GB table 
already mentioned above all the services were restarted and one of 
the applications failed to start with the following database error:

Caused by: org.postgresql.util.PSQLException: ERROR: invalid memory alloc request size 18446744073709551613
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2182)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1911)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:173)
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:622)
	at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:472)
	at org.postgresql.jdbc.PgStatement.executeQuery(PgStatement.java:386)
	at joulex.cloud.data.tables.JxObjectStoreJsonTable.getAllJsonsAsHashMap(JxObjectStoreJsonTable.java:215)
	... 16 more

Checking the database logs the problems seem to have started two days 
ago:
2016-02-10 16:00:34 GMTERROR:  invalid page in block 1255 of relation base/16387/3634911224
2016-02-10 16:00:34 GMTCONTEXT:  automatic vacuum of table "cloud.t_63d1c1eb806f4e25904b659b75176093.y2016_utilization"

For some reason it doesn't seem to be caused by the disk being full 
since the database server was still able to write another 8 GB of log 
files for the next two days and the system was still working 
yesterday afternoon.
It also doesn't appear to be a hardware problem since all the other 
systems sharing the virtual hostsystem and the storage show no issues 
at all.

Unfortunately we don't have a recent backup of the database (a tool 
to back up all the relevant data was just finished recently and was 
not set up for this system yet).

Something else worth noting is that we disabled the autovacuum on the 
toast table of the 300+ GB table since we perform INSERT INTO on that 
tbale and the vacuum on the table was causing a performance hit. The 
autovacuum for it is still being performed to prevent wraparound from 
time to and that autovacuum was still running after the machine run 
out of disk space and the services was restarted.

Any help in recovering the data is appreciated and if there is more 
information necessary on this I will try to provide it. Thanks in 
advance.


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux