Re: How to design for cheap reliability with PostgreSQL?

Richard Huxton <dev@xxxxxxxxxxxx> · Tue, 02 Dec 2008 11:59:36 +0000

Michael Meier wrote:
> Hi,
> 
> I am in the process of designing a distributed application (Java EE 5)
> which has demand for database functionality at several points. Some of
> these databases need to provide really mission critical reliability,
> i.e. it would be the end of our world, if we would suffer from data
> loss or silent data corruption. Availabilty or servicability are nice
> but not that important. This means that it isn't that important that a
> transaction is committed, but if the transaction is committed, the
> data should be 100% safe - without any chance of data loss or
> corruption.

I'm assuming you mean 99.99..9% safe.

> On the other hand, real proper reliability is expensive and we do not
> have the money for serious hardware or "unbreakable" database
> licences. I assume a DB size of < 50GB with moderate write I/O and
> less moderate read I/O, so I would like to get a machine with 64 GB
> Ram for in-memory caching.

An "unbreakable" database isn't necessarily going to help you with the
sort of errors you're thinking of.

> I am thinking of PostgreSQL 8.3.n with n>=5  on top of Solaris 10 with ZFS.

Not sure if it's of interest, but there's been discussion of block-level
checksums for 8.4. You might want to check the archives for the -hackers
mailing list. I've never used ZFS, but from what I've read it would make
sense.

> So an extremely cheap low end hardware platform for the problem might
> look like this:

You and I are at different scales of "cheap" :-)

> Dual Quad Core Xeon Machine
> 64 GB FB-DIMM ECC-Ram
> Two 8-Port SAS Controller
> Internal 16 x 73 GB 15k hard Drives, partitioned as follows:
> - 4 drives:  3-way raid-1 zfs mirror with one additional hot spare for Solaris
> - 6 drives: 4-way raid-1 zfs mirror with two additional hot spares for
> Postgresql Tables
> - 6 drives: 4-way raid-1 zfs mirror with two additional hot spares for
> Postgresql Logging

That's a lot of disks for a 50GB database, unless you've got a *lot* of
writes.

> ZFS-Scrubbing every night.
> approx. 24.000$ list price from Sun (x4250) with Sun Platinum support
> for the hardware and Solaris,
> or approx. 12.000$ self-assembled from newegg.

Well, several of either must be preferable to your expensive version.

> And an extremely expensive high-end solution for the problem might
> look like this:
> Sun SPARC Enterprise M4000 Server with two SPARC64 VII CPUs
> 64GB registered ECC-Ram
> Sun M-Series RAS-Features (Checksumming of CPU-Registers, etc.)
> An external Drive Array (e.g. J4400) with 16 drives. Partioned like
> the internal drives above.
> ZFS-Scrubbing every night.
> approx. 160.000$
> 
> This insane price difference would get us data integrity beyond the
> ZFS checksumming feature: It would protect the data even in the CPU
> registers.
> 
> So the questions are:
> 
> 1) Is this necessary? Do bit errors happen with configurations like
> the cheap xeon one above?

You'll probably want to estimate the odds of earthquake/volcano/nuclear
war and see how much effort it's worth.

> 2) If this is necessary, is there any (software) way around it? Is
> there a PostgreSQL clustering solution available, where the cluster
> nodes check each others data integrity?

Hmm - there are commercial variants of PG that do clustering/replication
(Greenplum and EnterpriseDB for a start) but I'm not sure if it's quite
what you're after.

I don't know what sort of queries you're running, but you could do
something like:
- pg-pool to send queries to two front-end servers
- both replicate to a third server over slony, which compares the
replicated copies (synchronisation issues here though)
- you keep the replication stream or keep separate WAL archives so if a
problem is found you stop everything and rewind until you get to a known
good point.

Of course, all this just introduces more code with more chances of a
bug. It would *look* more reliable, but I'm not sure it would be.

HTH

-- 
  Richard Huxton
  Archonet Ltd

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general