Simon Riggs wrote:
On Tue, 2004-11-02 at 23:52, Martin Foster wrote:
Is there a way to restrict how much load a PostgreSQL server can take
before dropping queries in order to safeguard the server? I was
looking at the login.conf (5) man page and while it allows me to limit
by processor time this seems to not fit my specific needs.
Essentially, I am looking for a sort of functionality similar to what
Sendmail and Apache have. Once the load of the system reaches a
certain defined limit the daemon drops tasks until such a time that it
can resume normal operation.
Sounds great... could you give more shape to the idea, so people can
comment on it?
What limit? Measured how? Normal operation is what?
Drop what? How to tell?
Let's use the example in Apache, there is the Apache::LoadAvgLimit
mod_perl module which allows one to limit based on the system load
averages. Here is an example of the configuration one would find:
<Location /perl>
PerlInitHandler Apache::LoadAvgLimit
PerlSetVar LoadAvgLimit_1 3.00
PerlSetVar LoadAvgLimit_5 2.00
PerlSetVar LoadAvgLimit_15 1.50
PerlSetVar LoadAvgRetryAfter 120
</Location>
The end state is simple, once the load average moves above 3.00 for the
1 minute average the web server will not process the CGI scripts or
mod_perl applications under that directory. Instead it will return a
503 error and save the system from being crushed by ever increasing load
averages.
Only once the load average is below the defined limits will the server
process requests as normal. This is not necessarily the nicest or
cleanest way or doing things, but it does allow the Apache web server to
prevent a collapse.
There are ways of restricting the size of files, number of concurrent
processes and even memory being used by a daemon. This can be done
through ulimit or the login.conf file if your system supports it.
However, there is no way to restrict based on load averages, only
processor time which is ineffective for a perpetually running daemon
like PostgreSQL has.
While not necessarily common on my servers I have witnessed some fairly
high load averages which may have led to the machine dropping outright.
Any help on this matter would be appreciated.
You can limit the number of connections overall?
Limiting concurrent connections is not always the solution to the
problem. Problems can occur when there is a major spike in activity
that would be considered abnormal, due to outside conditions.
For example using Apache::DBI or pgpool the DBMS may be required to
spawn a great deal of child processed in a short order of time. This
in turn can cause a major spike in processor load and if unchecked by
running as high demand queries the system can literally increase in load
until the server buckles.
I've seen this behavior before when restarting the web server during
heavy loads. Apache goes from zero connections to a solid 120,
causing PostgreSQL to spawn that many children in a short order of time
just to keep up with the demand.
PostgreSQL undertakes a penalty when spawning a new client and accepting
a connection, this slows takes resources at every level to accomplish.
However clients on the web server are hitting the server at an
accelerated rate because of the slowed response, leading to even more
demand being placed on both machines.
In most cases the processor will be taxed and the load average high
enough to cause even a noticeable delay when using a console, however it
will generally recover... slowly or in rare cases crash outright. In
such a circumstance, having the database server refuse queries when the
sanity of the system is concerned might come in handy for such a
circumstance.
Of course, I am not blaming PostgreSQL, there are probably some
instabilities in the AMD64 port of FreeBSD 5.2.1 for dual processor
systems that lead to an increased chance of failure instead of recovery.
However, if there was a way to prevent the process from reaching
those limits, it may avoid the problem altogether.
Martin Foster
Creator/Designer Ethereal Realms
martin@xxxxxxxxxxxxxxxxxxx