Michael Meier wrote: > Hi, > > I am in the process of designing a distributed application (Java EE 5) > which has demand for database functionality at several points. Some of > these databases need to provide really mission critical reliability, > i.e. it would be the end of our world, if we would suffer from data > loss or silent data corruption. Availabilty or servicability are nice > but not that important. This means that it isn't that important that a > transaction is committed, but if the transaction is committed, the > data should be 100% safe - without any chance of data loss or > corruption. I'm assuming you mean 99.99..9% safe. > On the other hand, real proper reliability is expensive and we do not > have the money for serious hardware or "unbreakable" database > licences. I assume a DB size of < 50GB with moderate write I/O and > less moderate read I/O, so I would like to get a machine with 64 GB > Ram for in-memory caching. An "unbreakable" database isn't necessarily going to help you with the sort of errors you're thinking of. > I am thinking of PostgreSQL 8.3.n with n>=5 on top of Solaris 10 with ZFS. Not sure if it's of interest, but there's been discussion of block-level checksums for 8.4. You might want to check the archives for the -hackers mailing list. I've never used ZFS, but from what I've read it would make sense. > So an extremely cheap low end hardware platform for the problem might > look like this: You and I are at different scales of "cheap" :-) > Dual Quad Core Xeon Machine > 64 GB FB-DIMM ECC-Ram > Two 8-Port SAS Controller > Internal 16 x 73 GB 15k hard Drives, partitioned as follows: > - 4 drives: 3-way raid-1 zfs mirror with one additional hot spare for Solaris > - 6 drives: 4-way raid-1 zfs mirror with two additional hot spares for > Postgresql Tables > - 6 drives: 4-way raid-1 zfs mirror with two additional hot spares for > Postgresql Logging That's a lot of disks for a 50GB database, unless you've got a *lot* of writes. > ZFS-Scrubbing every night. > approx. 24.000$ list price from Sun (x4250) with Sun Platinum support > for the hardware and Solaris, > or approx. 12.000$ self-assembled from newegg. Well, several of either must be preferable to your expensive version. > And an extremely expensive high-end solution for the problem might > look like this: > Sun SPARC Enterprise M4000 Server with two SPARC64 VII CPUs > 64GB registered ECC-Ram > Sun M-Series RAS-Features (Checksumming of CPU-Registers, etc.) > An external Drive Array (e.g. J4400) with 16 drives. Partioned like > the internal drives above. > ZFS-Scrubbing every night. > approx. 160.000$ > > This insane price difference would get us data integrity beyond the > ZFS checksumming feature: It would protect the data even in the CPU > registers. > > So the questions are: > > 1) Is this necessary? Do bit errors happen with configurations like > the cheap xeon one above? You'll probably want to estimate the odds of earthquake/volcano/nuclear war and see how much effort it's worth. > 2) If this is necessary, is there any (software) way around it? Is > there a PostgreSQL clustering solution available, where the cluster > nodes check each others data integrity? Hmm - there are commercial variants of PG that do clustering/replication (Greenplum and EnterpriseDB for a start) but I'm not sure if it's quite what you're after. I don't know what sort of queries you're running, but you could do something like: - pg-pool to send queries to two front-end servers - both replicate to a third server over slony, which compares the replicated copies (synchronisation issues here though) - you keep the replication stream or keep separate WAL archives so if a problem is found you stop everything and rewind until you get to a known good point. Of course, all this just introduces more code with more chances of a bug. It would *look* more reliable, but I'm not sure it would be. HTH -- Richard Huxton Archonet Ltd -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general