Search Postgresql Archives

Re: Lost rows/data corruption?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



fsync is on for all these boxes. Our customers run their own hardware with many different specification of hardware in use. Many of our customers don't have UPS, although their power is probably pretty reliable (normal city based utilities), but of course I can't guarantee they don't get an outage once in a while with a thunderstorm etc.

The problem here is that we are consistently seeing the same kind of corruption and symptoms across a fairly large number of customers (52 have reported this problem), so there is something endemic happening here that to be honest, I'm surprised no one else is seeing. Fundamentally there is nothing particularly abnormal with our application or data, but regardless, I would have thought these kind of things (application design, data representation etc) irrelevant to the reliability of the database not to allow duplicate data on a primary key. Something is causing this corruption, and one thing we do know is that it doesn't happen immediately with a new installation, it takes time (several months of usage) before we start to see this condition. I'd be really surprised if XFS is the problem as I know there are plenty of other people across the world using it reliability with PG.

We're going to see if we can build a test environment that can forcibly cause this but I don't hold much hope, as we've tried to isolate it before with little success. Here's what we tried changing when we originally went searching for the problem, and it still here:

- the hardware (tried single CPU instead of dual - though that maybe an issue with the OS)
- the OS version (tried Linux 2.6.5, 2.6.6, 2.6.7, 2.6.8.1, 2.6.10 and 2.4.22) - all using XFS
- the database table layout (tried changing the way the data is stored)
- the version of Jetty (servlet engine)
- the DB pool manager and PG JDBC driver versions
- the version of PG (tried two or three back from the latest)
- various vacuum regimes



----- Original Message ----- From: "Marco Colombo" <pgsql@xxxxxxxxxx>
To: "Andrew Hall" <temp02@xxxxxxxxxxxxxxx>
Cc: <pgsql-general@xxxxxxxxxxxxxx>
Sent: Wednesday, February 16, 2005 2:58 AM
Subject: Re: Lost rows/data corruption?



On Tue, 15 Feb 2005, Andrew Hall wrote:



It sounds like a mess, all right.  Do you have a procedure to follow to
replicate this havoc?  Are you sure there's not a hardware problem
underlying it all?

regards, tom lane


We haven't been able to isolate what causes it but it's unlikely to be
hardware as it happens on quite a few of our customer's boxes. We also use
XFS on linux 2.6 as a file system, so the FS should be fairly tolerant to
power-outages. Any ideas as to how I might go about isolating this? Have you
heard any other reports of this kind and suggested remedies?

Are you running with fsync = off? and did the hosts experience any power-outage recently?

.TM.
--
      ____/  ____/   /
     /      /       / Marco Colombo
    ___/  ___  /   /       Technical Manager
   /          /   / ESI s.r.l.
 _____/ _____/  _/        Colombo@xxxxxx



---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux