Re: understanding postgres issues/bottlenecks

david@xxxxxxx · Fri, 16 Jan 2009 00:59:53 -0800 (PST)

On Thu, 15 Jan 2009, Jean-David Beyer wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

M. Edward (Ed) Borasky wrote:
| Luke Lonergan wrote:
|> Not to mention the #1 cause of server faults in my experience: OS
|> kernel bug causes a crash.  Battery backup doesn't help you much there.
|>
|
| Well now ... that very much depends on where you *got* the server OS and
| how you administer it. If you're talking a correctly-maintained Windows
| 2003 Server installation, or a correctly-maintained Red Hat Enterprise
| Linux installation, or any other "branded" OS from Novell, Sun, HP, etc.,
| I'm guessing such crashes are much rarer than what you've experienced.
|
| And you're probably in pretty good shape with Debian stable and the RHEL
| respins like CentOS. I can't comment on Ubuntu server or any of the BSD
| family -- I've never worked with them. But you should be able to keep a
| "branded" server up for months, with the exception of applying security
| patches that require a reboot. And *those* can be *planned* outages!
|
| Where you *will* have some major OS risk is with testing-level software
| or "bleeding edge" Linux distros like Fedora. Quite frankly, I don't know
| why people run Fedora servers -- if it's Red Hat compatibility you want,
| there's CentOS.
|
Linux kernels seem to be pretty good these days. I ran Red Hat Linux 7.3
24/7 for over 6 months, and it was discontinued years ago. I recognize that
this is by no means a record. It did not crash after 6 months, but I
upgraded that box to CentOS 4 and it has been running that a long time. That
box has minor hardware problems that do not happen often enough to find the
real cause. But it stays up months at a time. All that box does is run BOINC
and a printer server (CUPS).

This machine does not crash, but it gets rebooted whenever a new kernel
comes out, and has been up almost a month. It run RHEL5.

I would think Fedora's kernel would probably be OK, but the other bleeding
edge stuff I would not risk a serious server on.

I have been running kernel.org kernels in production for about 12 years 
now (on what has now grown to a couple hundred servers), and I routinely 
run from upgrade to upgrade with no crashes. I tend to upgrade every year 
or so).

that being said, things happen. I have a set of firewalls running the 
Checkpoint Secure Platform linux distribution that locked up solidly a 
couple weeks after putting them in place (the iptables firewalls that they 
replaced had been humming along just fine under much heavier loads for 
months).

the more mainstream your hardware is the safer you are (unfortunantly very 
few RAID cards are mainstream), but I've also found that by compiling a 
minimal kernel that only supports the stuff that I need also contributes 
to reliability.

but even with my experiance, I would never architect anything with the 
expectation that system crashes don't happen. I actually see more crashes 
due to overheating (fans fail, AC units fail, etc) than I do from kernel 
crashes.

not everything needs reliability. I am getting ready to build a pair of 
postgres servers that will have all safety disabled. I will get the 
redundancy I need by replicating between the pair, and if they both go 
down (datacenter outage) it is very appropriate to loose the entire 
contents of the system and reinitialize from scratch (in fact, every boot 
of the system will do this)

but you need to think carefully about what you are doing when you disable 
the protection.

David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance