Just a little followup on this problem.
We've moved the database to another server where it ran without problems.
HP just released new raid controller drivers for Suse and a firmware
update for the controller itself.
Until now the problem hasn't occurred anymore.
Thanks!
Jo.
Chris Travers wrote:
Jo De Haes wrote:
OK. The saga continues, everything is a little bit more clear, but at
the same time a lot more confusing.
Today i wanted to reproduce the problem again. And guess what? A
vacuum of the database went thru without any problems.
I dump the block i was having problems with yesterday. It doesn't
report an invalid header anymore and it contains other data!!!
Inconsistant problems esp. with PostgreSQL are usually the result of
hardware failure.
Turns out the data that was returned yesterday belongs to another
database!
Some more detail about the setup. This server runs 2 instances of
postgresql. One production instance which is version 8.0.3. And
another testing instance installed in a different folder which runs
version 8.1.3 Am I wrong thinking this setup ought to work?
No. Ihave done it before too. PostgreSQL instances running on
different ports or addresses are sufficiently isolated to prevent this
from being a problem.
Both instances use completely seperated data folders.
So the first dump returned data that actually belongs to an 8.0.3
database (that runs fine). And today without _any_ intervention that
same block returns the correct data and the complete database is fine.
Where is the problem?
The fact that i'm running 2 different instances?
Cache on raid controller messing up?
Some strange voodoo?
I would see what sort of memory testing suite you can run on your system
first (memtestx86, for example) and go from there. It sounds to me like
some sort of a hardware issue. It *could* be bits flipped anywhere,
from the writehead on the disk to the main system memory or the CPU.
The likelihood that it is a random RAM error is reduced if you are using
ECC RAM. Otherwise it could be anything.
This being said, when I have seen bits flipped by the CPU usually you
get a lot of index issues and shared memory corruptions, so I would be
more inclined to think that this was RAM or RAID cache.
Best Wishes,
Chris Travers
Metatron Technology Consulting
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq