Re: db corruption/recovery help

Scott Marlowe <smarlowe@xxxxxxxxxxxxxxxxx> · Mon, 06 Jun 2005 16:56:04 -0500

On Mon, 2005-06-06 at 16:39, Ed L. wrote:
> On Monday June 6 2005 3:29 pm, Ed L. wrote:
> > On Monday June 6 2005 3:17 pm, Scott Marlowe wrote:
> > > On Mon, 2005-06-06 at 15:16, Ed L. wrote:
> > > > Someone flipped a breaker switch, and evidently triggered
> > > > corruption in one of our major clusters:
> > >
> > > OK, if postgresql is running on hardware that doe NOT lie
> > > about fsyncing, and it is set to fsync, this should NEVER
> > > happen.
> >
> > This is 7.3.4 running on an HP-UX 11.00 9000/800 PA-RISC box
> > with fsync = TRUE, built with gcc 3.2.2.  Database is entirely
> > on a SAN.
> >
> > We got very lucky:  the corrupted database was expendable
> > (essentially a log database).  I was able to just move the
> > data/base/NNNN directory off to the side, restart, drop the
> > corrupted db, and recreate schema...
> 
> The SAN never lost power, only the system itself.  I'd really 
> like to chase this to the root if possible.  Ideas?

It sounds like somewhere between postgresql and the SAN connector going
out the back, something is lying about fsync. I'm not that familiar with
lots of different SAN setups, so you might want to describe how things
are set up and see if anyone else knows more about them than me.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster