Re: Real application clustering in postgres.

Laurenz Albe <laurenz.albe@xxxxxxxxxxx> · Mon, 09 Mar 2020 09:52:49 +0100

On Fri, 2020-03-06 at 10:56 -0600, Ron wrote:
> > > > RAC is not really a high availability solution: because of the shared
> > > > storage, it has a sibgle point of failure.
> > > This is utter nonsense.  Dual redundant storage controllers
> > > connected to disks in RAID-10 configurations have been around for at
> > > least 25 years.
> > > 
> > > Oracle got it's clustering technology from DEC, and I know
> > > that works.  Cluster members, storage controllers and disks have all
> > > gone down, while the database and application keep on humming along.
> >
> > I am not saying that it is buggy, it is limited by design.
> > 
> > If you have mirrored disks, and you write junk (e.g, because of
> > a flaw in a fibre channel cable, something I have witnessed),
> > then you have two perfectly fine copies of the junk.
> 
> Why do you have just one FC path?

We didn't.
It just happened that the cable that the data were sent over was buggy.

> > I am not saying the (physical) disk is the single point of failure, the
> > (logical) file system is (Oracle calls it ASM / tablespace, but it is
> > still a file system).
> 
> Why isn't the filesystem (or RDBMS) throwing checksum errors?  This was 
> standard stuff in legacy Enterprise RDBMSs 20 years ago.

Checksums are nice for telling you that your storage is screwed.
They don't fix the problem.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com