My reply about server failure was shwoing what could go wrong at the server level assuming a first-class, properly run data center, with fully redundant power, including a server with dual power supplies on separate cords fed by separate UPS'es etc. ....
Unfortunately, correctly configured A/B power is all too rare these days. Some examples of foo that I've seen at professional data centers:
- Allegedly "A/B" power supplied from two phases of the same UPS (which was then taken down due to a tech's error during "hot" maintenance)
- "A/B" power fed through a common switch panel
- A/B power with dual attached servers, with each power feed running a steady 60% load (do the math!)
A classic piece of foo from a manufacturer - Dell supplies their low end dual-power rackmount boxes with a Y shaped IEC cable ... clearly, this is only suitable for non-redundant use but I've seen plenty of them deployed in data centers by less-than-clueful admins.
On Mon, Nov 16, 2009 at 2:12 PM, Scott Marlowe <scott.marlowe@xxxxxxxxx> wrote:
> On Mon, Nov 16, 2009 at 1:04 PM, Robert Schnabel <schnabelr@xxxxxxxxxxxx> wrote:
>>
>> So the short answer is yes, I have it running with
>> PostgreSQL and have not had any problems.
>>
>>
>> Have you unplugged the power cord a few times in the middle of heavy
>> write activity?
>>
>> ...Robert
>>
>> Nope. Forgive my ignorance but isn't that what a UPS is for anyway? Along
>> with a BBU controller.
>
> BBU controller, yes. UPS, no. I've seen more than one multi-million
> dollar hosting center go down from something as simple as a piece of
> wire flying into a power conditioner, shorting it out, and feeding
> back and blowing every single power conditioner and UPS AND the switch
> that allowed the diesel to come into the loop. All failed. Every
> machine lost power. One database server out of a few dozens came back
> up. In fact there were a lot of different dbm systems running in that
> center, and only the pg 7.2 version came back up unscathed.
>
> Because someone insisted on pulling the plug out from the back a dozen
> or so times to make sure it would do come back up. PG saved our
> shorts and the asses they contain. Sad thing is I'm sure the other
> servers COULD have come back up if they had been running proper BBUs
> and hard drives that didn't lie about fsync, and an OS that enforced
> fsync properly, at least for scsi, at the time.
>
> Power supplies / UPSes fail far more often than one might think. And
> a db that doesn't come back up afterwards is not to be placed into
> production.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>