On Wed, 1 Apr 2009, Scott Carey wrote:
On 4/1/09 9:54 AM, "Scott Marlowe" <scott.marlowe@xxxxxxxxx> wrote:
On Wed, Apr 1, 2009 at 10:48 AM, Stef Telford <stef@xxxxxxxxx> wrote:
Scott Marlowe wrote:
On Wed, Apr 1, 2009 at 10:15 AM, Stef Telford <stef@xxxxxxxxx> wrote:
I do agree that the benefit is probably from write-caching, but I
think that this is a 'win' as long as you have a UPS or BBU adaptor,
and really, in a prod environment, not having a UPS is .. well. Crazy ?
You do know that UPSes can fail, right? En masse sometimes even.
Hello Scott,
Well, the only time the UPS has failed in my memory, was during the
great Eastern Seaboard power outage of 2003. Lots of fond memories
running around Toronto with a gas can looking for oil for generator
power. This said though, anything could happen, the co-lo could be taken
out by a meteor and then sync on or off makes no difference.
Meteor strike is far less likely than a power surge taking out a UPS.
I saw a whole data center go black when a power conditioner blew out,
taking out the other three power conditioners, both industrial UPSes
and the switch for the diesel generator. And I have friends who have
seen the same type of thing before as well. The data is the most
expensive part of any server.
Yeah, well I?ve had a RAID card die, which broke its Battery backed cache.
They?re all unsafe, technically.
In fact, not only are battery backed caches unsafe, but hard drives. They
can return bad data. So if you want to be really safe:
1: don't use Linux -- you have to use something with full data and metadata
checksums like ZFS or very expensive proprietary file systems.
this will involve other tradeoffs
2: combine it with mirrored SSD's that don't use write cache (so you can
have fsync perf about as good as a battery backed raid card without that
risk).
they _all_ have write caches. a beast like you are looking for doesn't
exist
4: keep a live redundant system with a PITR backup at another site that can
recover in a short period of time.
a good option to keep in mind (and when the new replication code becomes
available, that will be even better)
3: Run in a datacenter well underground with a plutonium nuclear power
supply. Meteor strikes and Nuclear holocaust, beware!
at some point all that will fail
but you missed point #5 (in many ways a more important point than the
others that you describe)
switch from using postgres to using a database that can do two-phase
commits across redundant machines so that you know the data is safe on
multiple systems before the command is considered complete.
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance